franciscozorrilla / metaGEM

:gem: An easy-to-use workflow for generating context specific genome-scale metabolic models and predicting metabolic interactions within microbial communities directly from metagenomic data
https://franciscozorrilla.github.io/metaGEM/
MIT License
189 stars 41 forks source link

CompositionVis & modelVis output #135

Closed SoleilSu closed 1 year ago

SoleilSu commented 1 year ago

Hi Francisco,

I have tried to use metaGEM on my metagenomic samples by following the tutorial, and I am able to run the jobs without error. However, I have only 5 reassembled bins and 0 GEMs as shown in the GEMs.stats. I am not sure if it's because my sequence depth is not good enough, so I ran the toy datasets, but I got similar results. Especially, in both cases my compostionVis results look different from what is being presented in the tutorial. The graph does not show the distribution of different samples. I have attached the graph that I obtained from toy datasets below. image

franciscozorrilla commented 1 year ago

Hi @SoleilSu,

Regarding GEMs, were you able to install CarveMe and the academic version of the CPLEX solver? Please check your log files for the CarveMe jobs to see what the errror message there was.

Regarding the plot, I suspect that perhaps the abundance estimation output folders are empy, did you run the abundance calculation jobs?

Best, Francisco

SoleilSu commented 1 year ago

Thank you for the prompt response! I think Carveme works fine and I am able to get the XML files, and when I ran FBA on the XML files, it looks like there are reactions. Here is my modelVis pdf:

image

I ran the abundance job and I got a nonempty abundance.stats files:

image
SoleilSu commented 1 year ago

I just fixed GEMs issues. I am using Carveme 1.5.2 so I need to change something in the snakefile when I ran the modelVis https://github.com/franciscozorrilla/metaGEM/issues/52

But I still have a problem with CompositionVis.

franciscozorrilla commented 1 year ago

Great! You have your GEMs, taxonomy assignments, and abundance mappings 👍 Seems like the problem then is coming from the plotting script itself compositionVis.R

https://github.com/franciscozorrilla/metaGEM/blob/69e0629a5235285878e4e2d8a8a1dcd732949c08/workflow/scripts/compositionVis.R#L1-L24

Could you please share with me your GTDBTk.stats and abundance.stats so that I can try to reproduce your error? I suspect maybe the two files are not being joined properly for some reason and therefore you get a blank plot.

Best, Francisco

SoleilSu commented 1 year ago

Yea, I just noticed that the files were not joined as expected because the sampleID was not appended to the user_genome in the GTDBTk.stats like this issue here: #110 But it seems that my bug occurs because the same variable is not accessible in the context of my Snakefile. So I made a change in the following code in CompositionRule in the Snakefile.

        for folder in */;do 
            samp=$(echo $folder|sed 's|^.*/||');
            cat $folder/classify/*summary.tsv| sed 's/orig/o/g' | sed 's/permissive/p/g' | sed 's/strict/s/g' | sed "s/^/$samp./g";
        done > GTDBTk.stats

I changed the $samp to $(basename $folder) like the following:

        for folder in */;do 
            cat $folder/classify/*summary.tsv| sed 's/orig/o/g' | sed 's/permissive/p/g' | sed 's/strict/s/g' | sed "s/^/$(basename $folder)./g";
        done > GTDBTk.stats

In this way, my GTDBTk.stats has the sampleID, and my plot looks like this now:

image

However, I am expecting that the different sampleID have different y-axis of species similar to the one in the tutorial. So I am not sure if I should edit the compositionVis.R

Thank you!

franciscozorrilla commented 1 year ago

In line 19 of the script shown in my last comment, play around with the scales value from free to free_x or free_y, or simply remove this parameter from the facet_wrap() arguments

SoleilSu commented 1 year ago

Ok thank you so much! It seems that all issues have been fixed.