dahak-metagenomics / dahak

benchmarking and containerization of tools for analysis of complex non-clinical metagenomes.
https://dahak-metagenomics.github.io/dahak
BSD 3-Clause "New" or "Revised" License
21 stars 4 forks source link

Trouble getting started with Quickstart #104

Open nalbright opened 6 years ago

nalbright commented 6 years ago

I am having trouble successfully kicking off any commands after running through the Quick Start and The Really Quick Copy-And-Paste Quick Start. Specifically I have to following questions:

1) It is not clear what to do with the _SINGULARITYBINDPATH=”data:/data described in the documentation. Does this simply go on the command line when I execute the rest of the command?

2) At the end of “how do I specify my datafiles” , “how do I specify my workflow configuration”, and “how do I specify my workflow parameters” there is an example to run each of these, . However, there is no clear conclusion to this section is we can execute all these flags together. Do we have to run each of these separate or can we combine all of these into one command? By following the Quick Start I tried to execute the following command and received this output ( I am assuming part of the problem is not specifying a target, but at this point in the quick start I am still not sure what/how to call a target): • $ SINGULARITY_BINDPATH="data:/data" snakemake --use-singularity --configfile=config/example_datafile.json --configfile=config/example_workflowconfig.json –configfile=config/example_workflowparams.json

3) In the The Really Quick Copy-And-Paste Quick Start it is unclear where/when to execute export SINGULARITY_BINDPATH="data:/data". Again, is this on the command line with the singularity execution? At the end of this section I tried to execute the following command and it returns a “>” and looks like it might be executing, but doesn’t appear to be making any progress:
• $ export SINGULARITY_BINDPATH="data:/data snakemake -p -n --configfile=config/custom_readfilt_workflow.json read_filtering_pretrim_workflow

• When excluding the export SINGULARITY_BINDPATH: $ snakemake –p –configfile=config=custom_readfilt_workflow.json read_filtering_pretrim_workflow Output:
_Finished job 5. 3 of 7 steps (43%) done

Job 2: --- Pre-trim quality check of trimmed data with fastqc.

fastqc -t 1 //data/SRR606249_subset10_1_reads.fq.gz /data/SRR606249_subset10_2_reads.fq.gz -o /data /usr/bin/bash: fastqc: command not found Error in rule pre_trimming_quality_assessment: jobid: 2 output: data/SRR606249_subset10_1_reads_fastqc.zip, data/SRR606249_subset10_2_reads_fastqc.zip

RuleException: CalledProcessError in line 162 of /data/home/nalbright/dahak/workflows/read_filtering/Snakefile: Command ' set -euo pipefail; fastqc -t 1 //data/SRR606249_subset10_1_reads.fq.gz /data/SRR606249_subset10_2_reads.fq.gz -o /data ' returned non-zero exit status 127. File "/data/home/nalbright/dahak/workflows/read_filtering/Snakefile", line 162, in __rule_pre_trimming_qualityassessment File "/data/home/nalbright/miniconda3/lib/python3.6/concurrent/futures/thread.py", line 56, in run Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /data/home/nalbright/dahak/workflows/.snakemake/log/2018-07-13T121116.897357.snakemake.log

4) (This is more of a comment than a question) I could not find documentation for all the [FLAGS] and options. Is there a readme or doc that I am not seeing that I can view a fill list of these options? I found it difficult to grasp what these are throughout the documentation. For instance, the –p and –n options are not mentioned until later in the Quickstart Under The Really Quick Copy-And-Paste Quick Start. Also it states early in the quick start that “most important flag is –config flag” however we only use –configfile. Is this supposed to be the same thing? Lastly, I still don’t know what a target is and how to call it. Sorry for any confusion. I appreciate any clarification you can provide.

charlesreid1 commented 6 years ago

Shorter issues are helpful, as there are a couple of threads going on here. I'll address what seems to be the overarching issue.

In #101 @stephenturner suggested removing $ so he could copy and paste commands onto the command line, but removing $ seems to have caused confusion or @nalbright about what to do with these commands.

This quick start assumes folks are familiar with environment variables and command line flags, so they'll recognize that all of the code blocks given are command line commands that should be copied and pasted onto the command line. I can add a note to the top that everything is bash commands.

charlesreid1 commented 6 years ago

Your questions about targets are covered on the Running Workflows page. The quick start page is intended to be a "I know what I need to do, just give me the commands" guide. The other pages are better for "I'm not sure what's going on, please give me the 10,000 foot view".

From the top of the page:

NOTE: This guide assumes familiarity with Dahak workflows and how they work. For a more clear explanation of what's going on and how things work, start with the Running Workflows page.

charlesreid1 commented 6 years ago

Regarding Snakemake flags - questions about flags are addressed by the link to "Executing Snakemake" page in Snakemake's documentation. This link is on the first section of the quickstart page:

As covered on the Running Workflows page, workflows are run by using Snakemake from the command line. The basic syntax is: snakemake [FLAGS] (See the executing Snakemake page of the Snakemake documentation for a full list of options that can be used with Snakemake.)

This page links to a full list of command line flags, plus a table explaining what each one means:

screen shot 2018-07-13 at 12 53 55
kternus commented 6 years ago

Hi Nicolette, thanks for testing this, and let's keep track of what operating system you are using with each issue. I believe this issue was identified on Red Hat?

Please let us know if you are able to reproduce the same errors on an Ubuntu OS.