Open realmarcin opened 3 years ago
As key supporting data the gene annotations should also be ingested: http://genomics.lbl.gov/supplemental/bigfit/html/acidovorax_3H11/fit_genes.tab with the caveat that these are 'free text' annotations so may require standardization.
Can we not just get the annotations from uniprot? The challenge here is there is no shared ID between the files
This line
Ac3H11_2265 NA 1 scaffold5/16 499203 500063 + NA FIG146518: Zn-dependent hydrolases, including glyoxylases 0.6574 7 TRUE
May correspond to https://www.uniprot.org/uniprot/A0A165JRD7 ?
Do we have the AA sequences easily accessible?
All of the data is here (84G total): http://genomics.lbl.gov/supplemental/bigfit/
The numerical relative growth data would have to be converted - growth vs no growth, via eg thresholding.
Just taking the first organism as an example: http://genomics.lbl.gov/supplemental/bigfit/html/acidovorax_3H11/
On the organism page, under 'Genes' the 'Specific phenotypes' link gives a table of most significant phenotype per gene for this KO dataset: http://genomics.lbl.gov/supplemental/bigfit/html/acidovorax_3H11/specific_phenotypes and this file can serve as the primary data source. These columns:
sysName desc name lrn t Group Condition_1 Concentration_1 Units_1
provide the following data:
gene name description internal name log ratio normalized t-statistic condition group condition name concentration unit
For reference under 'Genes' the 'Gene fitness' link gives a full table of relative fitness values: http://genomics.lbl.gov/supplemental/bigfit/html/acidovorax_3H11/fit_logratios_good.tab The y-axis labels are 'locusId' which are gene ids and the x-axis labels are condition (sample) ids including a text description.
There is additional data on each condition on the organism page under 'Tables' then 'Experiments' then 'Detailed metadata for experiments': http://genomics.lbl.gov/supplemental/bigfit/html/acidovorax_3H11/expsUsed
A basic ingest of this data would model as mutant alleles or a gene-condition relation indicating that this gene X is essential for growth in condition Y. As key supporting data the gene annotations should also be ingested: http://genomics.lbl.gov/supplemental/bigfit/html/acidovorax_3H11/fit_genes.tab with the caveat that these are 'free text' annotations so may require standardization.
Further ingests could include: