a-h-b / dadasnake

Amplicon sequencing workflow heavily using DADA2 and implemented in snakemake
GNU General Public License v3.0
45 stars 16 forks source link

16S run error #39

Open sanche27 opened 1 month ago

sanche27 commented 1 month ago

Hello everyone,

I've tried using the config.16s.yaml with the test data it works fine. However, when I try running a dry run using my own data it prompts me this error.

You will not be able to submit dadasnake to a cluster unless you set normalMem in your config file. You haven't specified more than 0 bigmem cores, in cluster mode, all rules would be performed on normal cores with . Final resource settings: maxCores: 1 adding column with run info Traceback (most recent call last): File "/home/uyaguari/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/snakemake/init.py", line 593, in snakemake workflow.include( File "/home/uyaguari/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/snakemake/workflow.py", line 1182, in include exec(compile(code, snakefile.get_path_or_uri(), "exec"), self.globals) File "/home/uyaguari/dadasnake/Snakefile", line 9, in "workflow/rules/get_config.smk" File "/home/uyaguari/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/snakemake/workflow.py", line 1182, in include exec(compile(code, snakefile.get_path_or_uri(), "exec"), self.globals) File "/home/uyaguari/dadasnake/workflow/rules/get_config.smk", line 125, in if samples[['library','run']].duplicated().any(): File "/home/uyaguari/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/pandas/core/frame.py", line 4108, in getitem indexer = self.columns._get_indexer_strict(key, "columns")[1] File "/home/uyaguari/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6200, in _get_indexer_strict self._raise_if_missing(keyarr, indexer, axis_name) File "/home/uyaguari/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6252, in _raise_if_missing raise KeyError(f"{not_found} not in index") KeyError: "['library'] not in index"

This same error also happens often when trying other config settings. Please let me know if you need my config file, samples table, or anything else

Kind regards.

a-h-b commented 1 month ago

yes please, upload your config file. Best wishes - AHB

On 20 Jun 2024, at 19:57, sanche27 @.***> wrote:

Hello everyone,

I've tried using the config.16s.yaml with the test data it works fine. However, when I try running a dry run using my own data it prompts me this error.

You will not be able to submit dadasnake to a cluster unless you set normalMem in your config file. You haven't specified more than 0 bigmem cores, in cluster mode, all rules would be performed on normal cores with . Final resource settings: maxCores: 1 adding column with run info Traceback (most recent call last): File "/home/uyaguari/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/snakemake/init.py", line 593, in snakemake workflow.include( File "/home/uyaguari/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/snakemake/workflow.py", line 1182, in include exec(compile(code, snakefile.get_path_or_uri(), "exec"), self.globals) File "/home/uyaguari/dadasnake/Snakefile", line 9, in "workflow/rules/get_config.smk" File "/home/uyaguari/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/snakemake/workflow.py", line 1182, in include exec(compile(code, snakefile.get_path_or_uri(), "exec"), self.globals) File "/home/uyaguari/dadasnake/workflow/rules/get_config.smk", line 125, in if samples[['library','run']].duplicated().any(): File "/home/uyaguari/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/pandas/core/frame.py", line 4108, in getitem indexer = self.columns._get_indexer_strict(key, "columns")[1] File "/home/uyaguari/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6200, in _get_indexer_strict self._raise_if_missing(keyarr, indexer, axis_name) File "/home/uyaguari/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6252, in _raise_if_missing raise KeyError(f"{not_found} not in index") KeyError: "['library'] not in index"

This same error also happens often when trying other config settings. Please let me know if you need my config file, samples table, or anything else

Kind regards.

— Reply to this email directly, view it on GitHubhttps://github.com/a-h-b/dadasnake/issues/39, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKW37OEZZMWN5Y3P42VS7BLZIMJZDAVCNFSM6AAAAABJUN2Z7WVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM3DIOJXGE2TMNY. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Anna Heintz-Buschart Assistant Professor

Biosystems Data Analysis Swammerdam Institute for Life Sciences Faculteit der Natuurwetenschappen, Wiskunde en Informatica Universiteit van Amsterdam Science park 904 | 1098 XH Amsterdam | The Netherlands T: +31 (0)20 525 6547 | E: @.*** | room C2.205 www.uva.nl

Available on Mon | Tues | Wed | Thurs | Fri

sanche27 commented 1 month ago

Hello,

This is my config file.

GENERAL INFORMATION

you will always want to change these directories

raw_directory: "blankdata" sample_table: "blankdata/samples.blank.tsv" outputdir: "blankoutput"

paired: false

change to false for single-end

email: ""

STEPS

by default everything is run

do_primers: true do_dada: true do_taxonomy: true do_postprocessing: true

hand_off: biom: false phyloseq: true

PRIMER REMOVAL SETTINGS

primers: fwd: sequence: AGAGTTTGATCCTGGCTCAG name: 27F rvs: sequence: CTACGGCTACCTTGTTACGA name: 1492R

these are also the default primers, by the way

sequencing_direction: "unknown"

can be unknown, fwd_1, or rvs_1 (fwd_1 means the first read contains the fwd primer)

READ FILTERING SETTINGS (in DADA2)

filtering: trunc_length: fwd: 170 rvs: 130 trunc_qual: fwd: 13 rvs: 13 max_EE: fwd: 0.2 rvs: 0.2

these settings were evaluated for 515-806 on a HiSeq dataset in the original publication

DOWNSAMPLING IS OFF BY DEFAULT

downsampling: do: false

DADA2 SETTINGS

dada: pool: false errorEstimationFunction: loessErrfun error_nbases: 1e8

default is per-sample analysis using the standard error function trained on 1e8 reads

chimeras: remove: true

SETTINGS FOR TAXONOMIC ANNOTATION

taxonomy: mothur: do: true db_path: "db" tax_db: "silva.nr_v138_1.tgz"

You'll have to set these

run_on:
  - ASV
  - cluster

other classifiers are implemented, check the documentation (but mothur is most efficient)

blast: do: false

You'll have to set these

SETTINGS FOR CLUSTERING ASV TABLE AT e.g. 97%

post_clustering: do: true cutoff: 0.97

final_table_filtering: do: true keep_target_taxa: "." target_min_length: 245 target_max_length: 275

this will remove most mitochondrial and plastid sequences

postprocessing: rarefaction_curve: true picrust2: do: true stratified: true per_sequence_contrib: true skip_norm: false

treeing: do: true

Thank you for your time.
sanche27 commented 3 weeks ago

sample.config.txt Hello,

Please see my config file attached.

The dry run is working fine now, however encountering a new error. Error in rule dada_errors: jobid: 12 output: errors/models.1.RDS, stats/error_models.1.pdf log: logs/DADA2_errors.1.log (check log file(s) for error message) conda-env: /home/uyaguari/dadasnake/conda/2360ba3fcd6b6188c273760e9f7e8339

It is also saying I am missing lots of metadata.

Please let me know if you have any suggestions.

Hans