Open hgscott opened 6 months ago
Because it is hard to tell what genome made which model I downloaded from KBase, I am going to remake the model using the genome I know is correct, and then spot check a few of the glycolysis reactions.
Looking at the KBase narrative I am 99% sure that the Kbase model was using Zac's older version of the genome (4116 CDSs).
I tried importing the latest 4106 genome as a "FASTA assembly" in KBase, but got an error because it is amino acid sequences.
I know that my ModelSEEDpy model used the correct genome: https://github.com/C-CoMP-STC/GEM-mit1002/blob/d011e8e4ec886c2f1f4ad7b9f0806d4146bff642/make_model.py#L11
So lets compare the genes for reactions in that model to Michelle's annotations.
Spot check a reaction in glycolysis: MNXR102547 (KEGG: R01518, SEED: rxn01106_0)
Now, recreate the Excel spreadsheet I had before with the ModelSEEDpy model.
I made the full spreadsheet- and there seems to be much better (but maybe not perfect) agreement between the gene call IDs.
This is good- but I want a cleaner comparison between the two- i.e. #60