Closed kdahlquist closed 1 year ago
@ahmad00m gave the data to @Onariaginosa , but it is not uploaded to database yet.
This now works as a helpful follow-up to #988 —Apweiler data can be loaded to a local copy first in order to test/debug; then once validated, the data can be loaded into the production server
I'm still working on this issue but I have a logistics question with regards to loading the data to the database.
I was wondering what I should put for the taxon_id, sample_id, and possibly the dataset.
Following up, we won't change the taxon IDs for yeast right now because it would require changing the database schema. That part of it is now referenced in issue #994
@ahmad00m reported a loading issue which turned out to be a COPY
format divergence
Upon re-running, a genuine missing gene ID was then found; @ahmad00m will look at it and consult with @kdahlquist as needed
@ahmad00m investigated the missing gene ID's and found that even though some are genuine mitochondrial genes, not all are. Some of them are involved in the regulation of phospholipid metabolism and other things. I deleted more than 2 dozen of these genes but the issue doesn't seem to get fixed so, I'm hoping to see what my next step would be in loading this data. Perhaps I could use the genes in our database as a template and remove any ID that doesn't match what we have in the database, but we could discuss this more during our meeting this week.
Can @ahmad00m give examples of the genes he removed?
I exported the gene data from the fall2021 schema (which is used on GRNsight) and used that file as a reference to check whether the genes in the Apweiler data were already on our database. I found out there are 129 genes in the Apweiler data that are not present in our database, however I need to find a way to find the standard id's for these genes so then I'm able to input them into the gene table. I'm thinking maybe I can use the reference gene id's file that I got from YEASTRACT and then find their respective standard id's from that file and create a csv file that I can later use to load to the database.
@Onariaginosa suggests looking up standard IDs in SGD as well
☝🏽@kdahlquist agrees
I finished writing the scripts that were not in our database and I uploaded them in GRNsight-archive repository and finished the documentation for loading data and it can be found HERE. I would just have to go over the naming conventions and cleaning up the code a little bit later this week/next week.
Just needs a top-down review now, then finally uploading into our AWS server
The data is finally on the AWS server. Thanks for all the help from @Onariaginosa!
This is complete.
I thought that we had put the Apweiler data that @ahmad00m worked on in our backend database. However I'm not seeing it on beta.