franciscozorrilla / metaGEM

:gem: An easy-to-use workflow for generating context specific genome-scale metabolic models and predicting metabolic interactions within microbial communities directly from metagenomic data
https://franciscozorrilla.github.io/metaGEM/
MIT License
203 stars 42 forks source link

other inputs #85

Closed smb20200615 closed 1 year ago

smb20200615 commented 3 years ago

Is there a way to input MAGs that we have generated using other pipelines into the tool? If so, at what point?

franciscozorrilla commented 3 years ago

Hi @smb20200615,

It really depends on what you want to do, but I see two steps where it seems reasonable to integrate MAGs from other pipelines:

  1. At the bin_refinement step: here you would need 3 bin sets generated from a single assembly (i.e. same contigs/headers but binned differently). Maybe you have generated MAGs with a new binner or alternative MAG pipeline, so you could refine + reassemble them to get a final consensus set.

  2. At the metabolic model reconstruction step: here you would use ORF-annotated protein bins to build metabolic models with CarveMe. Then you can simulate them within communities using SMETANA to predict metabolic interactions.

Please let me know if you had any other ideas in mind, or if you have further questions.

Best wishes, Francisco

smb20200615 commented 3 years ago

Hi Francisco,

Thank you for your thorough comment. The reason I am struggling to make MAGs using your pipeline is because I don't have the account name for my cluster account. When I delete the account line, then I get errors during job submission. Perhaps the better question is how to run without the account name line

"__default__" : {
        "account" : "your-account-name",
        "time" : "0-06:00:00",
        "n" : 48,
        "tasks" : 1,
        "mem" : 180G,
        "name"      : "DL.{rule}",
        "output"    : "logs/{wildcards}.%N.{rule}.out.log",
},
franciscozorrilla commented 3 years ago

Hi @smb20200615,

Indeed that is quite strange, have you previously/successfully submitted any jobs on your cluster without an account name? I do not think that this should be possible, unless it is your own cluster/workstation. In my institution's SLURM based cluster one can can use the mybalance command to view your accounts. If this does not work then I would suggest contacting your cluster support team or have a look at any documentation provided by your institiution's cluster.

Best wishes, Francisco

smb20200615 commented 3 years ago

also sometimes these commands fail bash metaGEM.sh -t metabat -j 2 -c 24 -m 80 -h 10 bash metaGEM.sh -t maxbin -j 2 -c 24 -m 80 -h 10 (it seems there were no bins produced). Should we just proceed with downstream steps in that case?

franciscozorrilla commented 3 years ago

Indeed, you can simply proceed to bin refinement and reassembly. let me know how it goes!

smb20200615 commented 3 years ago

sorry for the subsequent question. When I run refinement it still tries to rerun the samples that failed. I have been following the steps of the tutorial.

franciscozorrilla commented 3 years ago

No problem, happy to help with your questions. OK that makes sense, since the binRefine rule takes in 3 inputs and you are missing one of them. https://github.com/franciscozorrilla/metaGEM/blob/eb0860945fd0d8efa8495aeb441b55969c4e97b1/Snakefile#L879-L883 I would recommend trying to "trick" snakemake into thinking it already created the files by creating a dummy folder for those samples that failed to generate any bins with maxbin, e.g. mkdir maxbin/sample/sample.maxbin-bins. I believe that this is what I have done in the past, since metaWRAP will accept an empty folder and just proceed with the remaining draft bin sets.

Please let me know if this works for you. If so, then I will try to modify the maxbin rule so that it creates this dummy directory if the binning fails to generate any MAGs.

smb20200615 commented 3 years ago

Thank you so much! That fixed it. I was wondering if the metabolic models were done just for the prokaryotic MAGs. Also, do you have guidance on how to adapt the media config parameter for our biome of interest?

franciscozorrilla commented 3 years ago

Glad to hear it worked!

Indeed, CarveMe only reconstructs models for prokaryotic MAGs. Regarding media, this is something that you would have to adapt/design based on literature/domain knowledge and using metabolite IDs from the bigg database. What biome are your samples from?