Closed kunaljaani closed 11 months ago
Hi Kunal, glad you find the pipeline useful 👍
I agree, having an additional parameter such as --skipChecks | -s
would be good for skipping past the config checks in some cases.
This will take some time on my part, since I've also been meaning to do some re-factoring to the metaGEM wrapper file.
For now, I suggest that you modify you metaGEM.sh
file by deleting the following loops:
That should remove most of the prompts depending on how you are submitting jobs, let me know if this helps. Best, Francisco
Thank you very much for your prompt response. Yes, I will give a try by removing the mentioned loops and update you.
Thanks a lot. Kunal
Hi Kunal, glad you find the pipeline useful 👍
I agree, having an additional parameter such as
--skipChecks | -s
would be good for skipping past the config checks in some cases.This will take some time on my part, since I've also been meaning to do some re-factoring to the metaGEM wrapper file. For now, I suggest that you modify you
metaGEM.sh
file by deleting the following loops:That should remove most of the prompts depending on how you are submitting jobs, let me know if this helps. Best, Francisco
I had similar issue and fixed the part you mentioned above. However, I found other similar loops under submitLocal() and submitCluster() function, how should I change these parts to avoid answering yes or no?
Thanks in advance! Qing
Hi Qing, the metaGEM.sh
wrapper script is meant to be a helper script to force users to double check parameters and configuration before submitting jobs. You can completely circumvent it and directly use Snakemake to avoid the need for user input, e.g. the following will use parameters from your config_cluster.json file to submit jobs 200 from the Snakefile.
# example specifying memory
nohup snakemake all -j 200 -k --cluster-config cluster_config.json -c "sbatch -A {cluster.account} -p {cluster.part} --mem {cluster.mem} -t {cluster.time} -n {cluster.n} --ntasks {cluster.tasks} --cpus-per-task {cluster.n} --output {cluster.output}" &
# example without specifying memory
nohup snakemake all -j 200 -k --cluster-config cluster_config.json -c "sbatch -A {cluster.account} -p {cluster.part} -t {cluster.time} -n {cluster.n} --ntasks {cluster.tasks} --cpus-per-task {cluster.n} --output {cluster.output}" &
This is basically what the metaGEM.sh
script is doing as well, hope this helps!
Dear Francisco,
Thank you so much for quick response and detailed instructions!
Actually I would like to run carveme, memote, and SMETNAN part of this pipeline and start with carveme. I have moved annotated MAGs *.faa files into protein_bins folder as contig.yaml mentioned as input for carveme. What else should I do to run it? I keep getting errors from my trials and really appreciate your instruction on it.
Regards, Qing
Hi Qing, I am happy to help you with that.
Exactly, you simply need to look at the Snakefile rule for whatever task you want to run and make sure that your inputs are there. For example for carveme you need your protein bins and a media file. The current implementation submits a single job per genome, but you could also tweak/modify the rule to submit a job per sample and then generate GEMs with a for loop.
rule carveme:
input:
bin = f'{config["path"]["root"]}/{config["folder"]["proteinBins"]}/{{binIDs}}.faa',
media = f'{config["path"]["root"]}/{config["folder"]["scripts"]}/{config["scripts"]["carveme"]}'
output:
f'{config["path"]["root"]}/{config["folder"]["GEMs"]}/{{binIDs}}.xml'
benchmark:
f'{config["path"]["root"]}/{config["folder"]["benchmarks"]}/{{binIDs}}.carveme.benchmark.txt'
message:
"""
Make sure that the input files are ORF annotated and preferably protein fasta.
If given raw fasta files, Carveme will run without errors but each contig will be treated as one gene.
"""
shell:
"""
# Activate metagem environment
set +u;source activate {config[envs][metagem]};set -u;
# Make sure output folder exists
mkdir -p $(dirname {output})
# Make job specific scratch dir
binID=$(echo $(basename {input})|sed 's/.faa//g')
echo -e "\nCreating temporary directory {config[path][scratch]}/{config[folder][GEMs]}/${{binID}} ... "
mkdir -p {config[path][scratch]}/{config[folder][GEMs]}/${{binID}}
# Move into tmp dir
cd {config[path][scratch]}/{config[folder][GEMs]}/${{binID}}
# Copy files
cp {input.bin} {input.media} .
echo "Begin carving GEM ... "
carve -g {config[params][carveMedia]} \
-v \
--mediadb $(basename {input.media}) \
--fbc2 \
-o $(echo $(basename {input.bin}) | sed 's/.faa/.xml/g') $(basename {input.bin})
echo "Done carving GEM. "
[ -f *.xml ] && mv *.xml $(dirname {output})
"""
Since this is unrelated to the original issue, please feel free to open a new one and provide further details like the job logs and error messages.
Hi Francisco,
Thank you for the amazing pipeline. I wanted to ask you if it is possible to automate the response to the y/n that prompts at different steps by modifying the metaGEM.sh file? Could you please suggest some fix.
Thanks Kunal
while true; do