KBase Gene Call IDs do not match Pangenome

C-CoMP-STC / GEM-mit1002

Creative Commons Attribution 4.0 International

0 stars 0 forks source link

KBase Gene Call IDs do not match Pangenome #61

Open hgscott opened 6 months ago

hgscott commented 6 months ago

[x] Check if the KBase model was using the correct genome
[ ] Check if RAST annotation in KBase is redoing gene calls

hgscott commented 5 months ago

Because it is hard to tell what genome made which model I downloaded from KBase, I am going to remake the model using the genome I know is correct, and then spot check a few of the glycolysis reactions.

hgscott commented 5 months ago

Looking at the KBase narrative I am 99% sure that the Kbase model was using Zac's older version of the genome (4116 CDSs).

I tried importing the latest 4106 genome as a "FASTA assembly" in KBase, but got an error because it is amino acid sequences.

hgscott commented 5 months ago

I know that my ModelSEEDpy model used the correct genome: https://github.com/C-CoMP-STC/GEM-mit1002/blob/d011e8e4ec886c2f1f4ad7b9f0806d4146bff642/make_model.py#L11

So lets compare the genes for reactions in that model to Michelle's annotations.

hgscott commented 5 months ago

Spot check a reaction in glycolysis: MNXR102547 (KEGG: R01518, SEED: rxn01106_0)

Michelle gene call: 360
ModelSEEDpy model gene call: 360

hgscott commented 5 months ago

Now, recreate the Excel spreadsheet I had before with the ModelSEEDpy model.

hgscott commented 5 months ago

I made the full spreadsheet- and there seems to be much better (but maybe not perfect) agreement between the gene call IDs.

hgscott commented 5 months ago

This is good- but I want a cleaner comparison between the two- i.e. #60