C-CoMP-STC / GEM-mit1002

Creative Commons Attribution 4.0 International
0 stars 0 forks source link

KBase Gene Call IDs do not match Pangenome #61

Open hgscott opened 1 month ago

hgscott commented 1 month ago
hgscott commented 1 month ago

Because it is hard to tell what genome made which model I downloaded from KBase, I am going to remake the model using the genome I know is correct, and then spot check a few of the glycolysis reactions.

hgscott commented 1 month ago

Looking at the KBase narrative I am 99% sure that the Kbase model was using Zac's older version of the genome (4116 CDSs).

I tried importing the latest 4106 genome as a "FASTA assembly" in KBase, but got an error because it is amino acid sequences.

Image

hgscott commented 1 month ago

I know that my ModelSEEDpy model used the correct genome: https://github.com/C-CoMP-STC/GEM-mit1002/blob/d011e8e4ec886c2f1f4ad7b9f0806d4146bff642/make_model.py#L11

So lets compare the genes for reactions in that model to Michelle's annotations.

hgscott commented 1 month ago

Spot check a reaction in glycolysis: MNXR102547 (KEGG: R01518, SEED: rxn01106_0)

hgscott commented 1 month ago

Now, recreate the Excel spreadsheet I had before with the ModelSEEDpy model.

hgscott commented 1 month ago

I made the full spreadsheet- and there seems to be much better (but maybe not perfect) agreement between the gene call IDs. Image

hgscott commented 1 month ago

This is good- but I want a cleaner comparison between the two- i.e. #60