canc1993 / cheshire-gapfilling

MIT License
18 stars 4 forks source link

Kbase models, and ModelSEED pool construction #8

Open zuozuozuozuozuozuozuo opened 1 month ago

zuozuozuozuozuozuozuo commented 1 month ago

Hi.

I would like to learn about the possibility to run CHESHIRE with xml models reconstructed from Kbase. It is known that Kbase provides ModelSEED-based pipelines for construction, but I am not completely sure about whether Kbase models generated from those pipelines are also available for CHESHIRE.

First of all, due to lack of preparing ModelSEED pool in advance, I run CHESHIRE with a Kbase model using BiGG pool to check out whether this model is readable as input. Subsequently, I succeeded to get the output of predicted_scores and similarity_scores, though all the scores are wrongly calculated to the same value. (Kbase models are generally curated as follows: Kbase-model-in-xml If any exemplified xml file is required, please feel it free to let me see.)

Another concerned question is how to build a ModelSEED pool. I find it infeasible to directly get a xml pool from ModelSEED ontology, since ModelSEED provides a tsv file for universal reactions in Github but only accepts fasta file to generate xml files in its home website https://modelseed.org/. However, if using third-party ModelSEED-based platforms, the pools generated by them will encounter a similar concern to the models generated by Kbase.

To my current knowledge, the key point of these questions is about namespace, while I have no idea of the namespaces adopted by ModelSEED. Previously, I failed to seek out a documentation in ModelSEED to show its namespaces.

Is there any suggestion to construct ModelSEED pools? After this job to be addressed, I can thereby review the the possibility to run CHESHIRE with Kbase models by myself.

canc1993 commented 1 month ago

Hello, You can map the reaction identifiers between BiGG and ModelSEED. The mapping files can be downloaded from the BiGG database.

zuozuozuozuozuozuozuo commented 4 weeks ago

Hi @canc1993,

Pretty glad to receive your rapid reply, and I will consider the potential solution in accordance with your advice soon.

Currently, I am trying to build a Kbase universe pool, since the Kbase models(built by ModelSEED-based pipelines) are readable for CHESHIRE. By this way, I would like to see whether Kbase models will successfully match Kbase universe pool. This work has partially been done, and I am going to address some mistakes as follows: image

I do thank you for your kind advice sincerely, and I suppose it makes sence. Ummm..., but frankly speaking, I still have no idea about how to create xml files in the downstream of your suggested solution. For instance, I can do the mapping job, but after that I have to comply with the namespaces of ModelSEED (which is consistent with ModelSEED models, right?). I am not familiar with the curation of xml files. That is an important reason for me to try to finish this work using existing automatic or semi-automatic pipelines like Kbase or CarveMe, rather than by manual effort.

zuozuozuozuozuozuozuo commented 3 weeks ago

Hi @canc1993, I have virtually succeed in calculating confidence scores with a Kbase model and a Kbase universe pool. I feel so delighted to share this nice message with you.

Kbase models are available to be gapfilled using CHESHIRE. Spectacular!

Though the pool construction process indeed took more efforts than I previously imagined, the results below exactly indicated that it finally worked.(The "null" cells have valid values in other rows, which are generated from the original BiGG test. What a careless mistake~) image

Next stage, I will try to refine my current Kbase pool (including 44370 reactions) which results in totally 4300+ candidate reactions (confidence score >0.9995) before filtered by similarity scores. That is too universal. My preliminary plan is as follows:

  1. dereplicate reactions with ambiguous equilibrium and linked to any other reactions with smaller ids.
  2. discard biomass, exchange, demand and sink reactions in accordance with your work.
  3. delete reactions without a compartment of cytosol or extracellular.
  4. delete reactions solely from plant or fungus models.

Looking forward to what will happen to the size of final candidate reactions after pool refinement and filtration with similarity scores.

canc1993 commented 3 weeks ago

Thank you for the updates. Pool refinement is important. We didn't do it in our paper, but it's worth investigating.

Best, Can