cfrioux / miscoto

Python package for large-scale community selection in microbiota
GNU Lesser General Public License v3.0
6 stars 0 forks source link

seed #11

Closed ntromas closed 2 years ago

ntromas commented 3 years ago

Hi,

Thanks for this pipeline. I am at the same time looking at m2m. I wonder how to generate a "seed" file in .xml especially if I have no idea about the medium that would be used for the community I am working on. Any suggestions? I am not interested into a specific pathways but more how "my" bacteria community cooperate. Thanks for your help!

Nico

cfrioux commented 3 years ago

Hi Nico,

It depends on the environmental context. If you work with human gut bacteria, you can take a look at the VMH diets. Otherwise, you can design a set of minimal nutrients using literature. You could even consider running the program multiple times, each time with a different randomized set of seeds (where you vary the carbon source or others) in order to capture the cooperation landscape in multiple environments.

As for the technical side, should you need help to generate the xml file itself, you can use m2m seeds.

Hope this helps a bit,

Clémence

ntromas commented 3 years ago

Hi Clemence,

Thanks for your answer! I wonder if by any chance you would have some example of metabolites.txt files. On m2m, I can see that the name are specific. I guess it is from a database, right? Or it is from the db used to generate the model (I used gapseq)... Sorry if for the naive question!

Cheers,

Nico

cfrioux commented 3 years ago

Hi Nico,

I don't have any examples but I can explain why the names seem so specific. It is because i) they match compounds IDs from the Metacyc database and ii) they are encoded (special characters are forbidden in SBML species id fields, compounds ids usually start by M and have a suffix made of the compartment id).

You have to use the compounds that match your metabolic networks you wish to study, and thus the database you used to build them. If you look at one of the metabolic networks, you have to provide a list of metabolite identifiers that are consistant with the ones in its listofspecies, and select their id:

<listOfSpecies>
        <species id="M_GLUCOSE_c" name="GLUCOSE" compartment="c"/>
        <species id="M_CO2_c" name="CO2" compartment="c"/>
        <species id="M_O2_c" name="O2" compartment="c"/>

In the example above, if I wanted to add cytosolic glucose to the seeds, I'd have to add M_GLUCOSE_c in the metabolites.txt file.

Let me know if something is unclear.

Clémence

ntromas commented 3 years ago

Hi Clemence,

Thanks for your answer! I generated a model for each member of a small community (3-4 taxa) from their genomes. As we never cultivated them, I used default parameters to generate the models with gapseq (with a default medium). For the construction of metabolic network models, gapseq uses a reaction and metabolite database that is derived from the ModelSEED database. My objective is to determine the possibility that one of the community member could share/cooperate with another one. If I understood correctly, miscoto and m2m needs models. I just wonder how to list metabolites or medium composition that would reflect the in natura conditions (e.g freshwater) to build as well my model and to play with miscoto or m2m. Again, sorry for my naive questions, I am learning how to use these tools - pretty new to me!

Thanks a lot for your time and suggestions,

Nico

cfrioux commented 3 years ago

Hi Nico,

No worries :)

An easy and simple solution that I see in order to generate a first list of compounds is to work with the default medium used by gapseq. This would likely not represent natural conditions but that would be a first step quite fast to implement. That way you would already be able to compare the metabolism of your 4 species with m2m: do they roughly produce the same metabolites or not, how complementary to each other their metabolisms are...

As a second step, my advice would be to look for the macromolecules generally found in freshwater according to the literature and build a list of metabolites out of it. You can run several tests as computation is quite fast, and observing the differences in individual and collective metabolisms when varying the seed compounds is already informative by itself.

Clémence

ntromas commented 3 years ago

Hi Clémence,

Thanks a lot for the suggestions! I would have a last question for you. I have finally decided to use carveme as the model is built only from genetic evidence, without gap-filling approach. If I am correct, the compounds database used is from http://bigg.ucsd.edu (I guess you know it). So I have to generate a metabolites list using the same name as carveme and Bigg models right? I will try now m2m :)

Cheers,

Nico

Le ven. 16 avr. 2021 10 h 42, Clémence Frioux @.***> a écrit :

Hi Nico,

No worries :)

An easy and simple solution that I see in order to generate a first list of compounds is to work with the default medium used by gapseq. This would likely not represent natural conditions but that would be a first step quite fast to implement. That way you would already be able to compare the metabolism of your 4 species with m2m: do they roughly produce the same metabolites or not, how complementary to each other their metabolisms are...

As a second step, my advice would be to look for the macromolecules generally found in freshwater according to the literature and build a list of metabolites out of it. You can run several tests as computation is quite fast, and observing the differences in individual and collective metabolisms when varying the seed compounds is already informative by itself.

Clémence

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cfrioux/miscoto/issues/11#issuecomment-821226897, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY5D6GHSZLYRJJ5FVDKL2LTJBEHFANCNFSM4256OG3A .

cfrioux commented 3 years ago

Hi Nico,

Yes, precisely, you would have to generate a list of metabolites from the Bigg database then. A good practice would be to check that every identifier you add in the seed file is a metabolite that occur in some or all of your models. Just to be sure.

For instance, if histidine is a seed, you would add to the list M_his__L_c to consider the cytosolic version of histidine as available, or M_his__L_e if you rather choose the extracellular version. In the latter case, be careful, sometimes transport reactions are not well inferred in automatically-reconstructed metabolic networks, and without a transport reaction to the cytosol, your histidine molecule would be useless (not usable by the internal metabolism). If you check the presence of M_his__L_c in any of your models, you will find it and be confident that this seed can be used by them in some reactions.

Clémence

ntromas commented 3 years ago

Hi Clémence,

Thanks a lot for this information!

Nico

Le ven. 16 avr. 2021 à 12:56, Clémence Frioux @.***> a écrit :

Hi Nico,

Yes, precisely, you would have to generate a list of metabolites from the Bigg database then. A good practice would be to check that every identifier you add in the seed file is a metabolite that occur in some or all of your models. Just to be sure.

For instance, if histidine is a seed, you would add to the list M_hisL_c to consider the cytosolic version of histidine as available, or M_hisL_e if you rather choose the extracellular version. In the latter case, be careful, sometimes transport reactions are not well inferred in automatically-reconstructed metabolic networks, and without a transport reaction to the cytosol, your histidine molecule would be useless (not usable by the internal metabolism). If you check the presence of M_his__L_c in any of your models, you will find it and be confident that this seed can be used by them in some reactions.

Clémence

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cfrioux/miscoto/issues/11#issuecomment-821309891, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY5D6EF6XG6MUORVETM6FLTJBT5FANCNFSM4256OG3A .

--


Nicolas Tromas PhD Université de Montréal Département de sciences biologiques Microbial Evolutionary Genomics Group-Laboratoire de Jesse Shapiro Pavillon Marie-Victorin 90 Vincent-d'Indy, Montréal, Québec, H2V 2S9 Phone: (514) 343 6111 3188 E-mail: @. @.> Researchgate: NTromasPage https://www.researchgate.net/profile/Nicolas_Tromas Web: http://www.shapirolab.ca/


ntromas commented 3 years ago

Hi Clémence,

By curiosity, have you ever compared miscoto and smetana output? Smetana can be used without nutrient/metabolite addition but in that case I guess it is only based on the genetic information.

Currently waiting for Pathways-tools licence...

Have a good week end!

Nico

Le ven. 16 avr. 2021 à 13:36, Nicolas Tromas @.***> a écrit :

Hi Clémence,

Thanks a lot for this information!

Nico

Le ven. 16 avr. 2021 à 12:56, Clémence Frioux @.***> a écrit :

Hi Nico,

Yes, precisely, you would have to generate a list of metabolites from the Bigg database then. A good practice would be to check that every identifier you add in the seed file is a metabolite that occur in some or all of your models. Just to be sure.

For instance, if histidine is a seed, you would add to the list M_hisL_c to consider the cytosolic version of histidine as available, or M_hisL_e if you rather choose the extracellular version. In the latter case, be careful, sometimes transport reactions are not well inferred in automatically-reconstructed metabolic networks, and without a transport reaction to the cytosol, your histidine molecule would be useless (not usable by the internal metabolism). If you check the presence of M_his__L_c in any of your models, you will find it and be confident that this seed can be used by them in some reactions.

Clémence

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cfrioux/miscoto/issues/11#issuecomment-821309891, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY5D6EF6XG6MUORVETM6FLTJBT5FANCNFSM4256OG3A .

--


Nicolas Tromas PhD Université de Montréal Département de sciences biologiques Microbial Evolutionary Genomics Group-Laboratoire de Jesse Shapiro Pavillon Marie-Victorin 90 Vincent-d'Indy, Montréal, Québec, H2V 2S9 Phone: (514) 343 6111 3188 E-mail: @. @.> Researchgate: NTromasPage https://www.researchgate.net/profile/Nicolas_Tromas Web: http://www.shapirolab.ca/


--


Nicolas Tromas PhD Université de Montréal Département de sciences biologiques Microbial Evolutionary Genomics Group-Laboratoire de Jesse Shapiro Pavillon Marie-Victorin 90 Vincent-d'Indy, Montréal, Québec, H2V 2S9 Phone: (514) 343 6111 3188 E-mail: @. @.> Researchgate: NTromasPage https://www.researchgate.net/profile/Nicolas_Tromas Web: http://www.shapirolab.ca/


ntromas commented 3 years ago

Hi Clemence,

I finally installed m2m and run m2m seeds. If I am correct, the input is just a tsv /txt files with a list of metabolites, conserving Bigg database names, right? I have also verified their presence in the model file. I got this error: AssertionError: Seed file is not in the correct format. Error with "M_cobalt2_c". Example of a correct ID is M_OXYGEN45MOLECULE_c. Rules = only numbers, letters or underscore in IDs, not starting with a number. One ID per line. Any idea? Cause M_cobalt2_c is fine I think? Thanks!

ntromas commented 3 years ago

Got it! It was an issue within my file that generated unwanted characters.

Cheers