franciscozorrilla / metaGEM

:gem: An easy-to-use workflow for generating context specific genome-scale metabolic models and predicting metabolic interactions within microbial communities directly from metagenomic data
https://franciscozorrilla.github.io/metaGEM/
MIT License
189 stars 41 forks source link

automate the response to y/n #133

Closed kunaljaani closed 11 months ago

kunaljaani commented 1 year ago

Hi Francisco,

Thank you for the amazing pipeline. I wanted to ask you if it is possible to automate the response to the y/n that prompts at different steps by modifying the metaGEM.sh file? Could you please suggest some fix.

Thanks Kunal

while true; do

franciscozorrilla commented 1 year ago

Hi Kunal, glad you find the pipeline useful 👍

I agree, having an additional parameter such as --skipChecks | -s would be good for skipping past the config checks in some cases.

This will take some time on my part, since I've also been meaning to do some re-factoring to the metaGEM wrapper file. For now, I suggest that you modify you metaGEM.sh file by deleting the following loops:

https://github.com/franciscozorrilla/metaGEM/blob/69e0629a5235285878e4e2d8a8a1dcd732949c08/workflow/metaGEM.sh#L254-L261

https://github.com/franciscozorrilla/metaGEM/blob/69e0629a5235285878e4e2d8a8a1dcd732949c08/workflow/metaGEM.sh#L273-L280

https://github.com/franciscozorrilla/metaGEM/blob/69e0629a5235285878e4e2d8a8a1dcd732949c08/workflow/metaGEM.sh#L292-L299

That should remove most of the prompts depending on how you are submitting jobs, let me know if this helps. Best, Francisco

kunaljaani commented 1 year ago

Thank you very much for your prompt response. Yes, I will give a try by removing the mentioned loops and update you.

Thanks a lot. Kunal

Qing-microbiol commented 12 months ago

Hi Kunal, glad you find the pipeline useful 👍

I agree, having an additional parameter such as --skipChecks | -s would be good for skipping past the config checks in some cases.

This will take some time on my part, since I've also been meaning to do some re-factoring to the metaGEM wrapper file. For now, I suggest that you modify you metaGEM.sh file by deleting the following loops:

https://github.com/franciscozorrilla/metaGEM/blob/69e0629a5235285878e4e2d8a8a1dcd732949c08/workflow/metaGEM.sh#L254-L261

https://github.com/franciscozorrilla/metaGEM/blob/69e0629a5235285878e4e2d8a8a1dcd732949c08/workflow/metaGEM.sh#L273-L280

https://github.com/franciscozorrilla/metaGEM/blob/69e0629a5235285878e4e2d8a8a1dcd732949c08/workflow/metaGEM.sh#L292-L299

That should remove most of the prompts depending on how you are submitting jobs, let me know if this helps. Best, Francisco

I had similar issue and fixed the part you mentioned above. However, I found other similar loops under submitLocal() and submitCluster() function, how should I change these parts to avoid answering yes or no?

Thanks in advance! Qing

franciscozorrilla commented 11 months ago

Hi Qing, the metaGEM.sh wrapper script is meant to be a helper script to force users to double check parameters and configuration before submitting jobs. You can completely circumvent it and directly use Snakemake to avoid the need for user input, e.g. the following will use parameters from your config_cluster.json file to submit jobs 200 from the Snakefile.

# example specifying memory
nohup snakemake all -j 200 -k --cluster-config cluster_config.json -c "sbatch -A {cluster.account} -p {cluster.part} --mem {cluster.mem} -t {cluster.time} -n {cluster.n} --ntasks {cluster.tasks} --cpus-per-task {cluster.n} --output {cluster.output}" &

# example without specifying memory
nohup snakemake all -j 200 -k --cluster-config cluster_config.json -c "sbatch -A {cluster.account} -p {cluster.part} -t {cluster.time} -n {cluster.n} --ntasks {cluster.tasks} --cpus-per-task {cluster.n} --output {cluster.output}" &

This is basically what the metaGEM.sh script is doing as well, hope this helps!

Qing-microbiol commented 11 months ago

Dear Francisco,

Thank you so much for quick response and detailed instructions!

Actually I would like to run carveme, memote, and SMETNAN part of this pipeline and start with carveme. I have moved annotated MAGs *.faa files into protein_bins folder as contig.yaml mentioned as input for carveme. What else should I do to run it? I keep getting errors from my trials and really appreciate your instruction on it.

Regards, Qing

franciscozorrilla commented 11 months ago

Hi Qing, I am happy to help you with that.

Exactly, you simply need to look at the Snakefile rule for whatever task you want to run and make sure that your inputs are there. For example for carveme you need your protein bins and a media file. The current implementation submits a single job per genome, but you could also tweak/modify the rule to submit a job per sample and then generate GEMs with a for loop.


rule carveme:
    input:
        bin = f'{config["path"]["root"]}/{config["folder"]["proteinBins"]}/{{binIDs}}.faa',
        media = f'{config["path"]["root"]}/{config["folder"]["scripts"]}/{config["scripts"]["carveme"]}'
    output:
        f'{config["path"]["root"]}/{config["folder"]["GEMs"]}/{{binIDs}}.xml'
    benchmark:
        f'{config["path"]["root"]}/{config["folder"]["benchmarks"]}/{{binIDs}}.carveme.benchmark.txt'
    message:
        """
        Make sure that the input files are ORF annotated and preferably protein fasta.
        If given raw fasta files, Carveme will run without errors but each contig will be treated as one gene.
        """
    shell:
        """
        # Activate metagem environment
        set +u;source activate {config[envs][metagem]};set -u;

        # Make sure output folder exists
        mkdir -p $(dirname {output})

        # Make job specific scratch dir
        binID=$(echo $(basename {input})|sed 's/.faa//g')
        echo -e "\nCreating temporary directory {config[path][scratch]}/{config[folder][GEMs]}/${{binID}} ... "
        mkdir -p {config[path][scratch]}/{config[folder][GEMs]}/${{binID}}

        # Move into tmp dir
        cd {config[path][scratch]}/{config[folder][GEMs]}/${{binID}}

        # Copy files
        cp {input.bin} {input.media} .

        echo "Begin carving GEM ... "
        carve -g {config[params][carveMedia]} \
            -v \
            --mediadb $(basename {input.media}) \
            --fbc2 \
            -o $(echo $(basename {input.bin}) | sed 's/.faa/.xml/g') $(basename {input.bin})

        echo "Done carving GEM. "
        [ -f *.xml ] && mv *.xml $(dirname {output})
        """

Since this is unrelated to the original issue, please feel free to open a new one and provide further details like the job logs and error messages.