a-h-b / dadasnake

Amplicon sequencing workflow heavily using DADA2 and implemented in snakemake
GNU General Public License v3.0
45 stars 17 forks source link

Nanopore data #22

Closed thierryjanssens closed 1 year ago

thierryjanssens commented 2 years ago

Hi all,

I am running an analysis on Naopore amplicon data (without primer processing, since it is a multiplex study, which I want to trim and treat further downstream) but get the following error in the job dad_poolTabs:

[1] "Removing chimeras" Error in S4Vectors:::normarg_names(value, class(x), length(x)) : attempt to set too many names (2) on GroupedIRanges object of length 0 Calls: names<- -> names<- -> names<- -> names<- -> Execution halted

I have no clue where to start debugging. Any suggestions?

$ nano ./config/config.nanoporetest.yaml

raw_directory: /home/test sample_table: /home/test/sample_table.tsv outputdir: /home/test/output do_dada: true do_primers: no do_taxonomy: no paired: false primer_cutting: overlap: 12 perc_mismatch: 0.25 indels: '' count: 1 both_primers_in_read: true primers: fwd: sequence: AGRGTTTGATCMTGGCTCAG name: 8F rvs: sequence: GGGCGGWGTGTACAAG name: 1387R sequencing_direction: fwd_1 filtering: trunc_length: fwd: 0 trunc_qual: fwd: 0 max_EE: fwd: Inf minLen: fwd: 500 maxLen: fwd: Inf minQ: fwd: 0 dada: pool: true band_size: 32 homopolymer_gap_penalty: -1 use_quals: true omega_C: 1 omega_A: 1e-30 gapless: false no_error_assumptions: false errorEstimationFunction: noqualErrfun selfConsist: false chimeras: remove: true method: pooled minFoldParentOverAbundance: 3.5 final_table_filtering: do: false postprocessing: funguild: do: false rarefaction_curve: true treeing: do: false ITSx: run: false taxonomy: decipher: do: false mothur: cutoff: 60 db_path: "../DBs/amplicon" tax_db: "SILVA_138_SSURef_NR99_prok" do: true post_ITSx: false tmp_dir: $USER/tmp email: ''

a-h-b commented 2 years ago

Hi Thierry - thanks for your question. It looks a bit like there might not be enough ASVs built, or maybe there's a bug in the script. You could try to set

chimeras:
  remove: false

and check if there is anything in the ASV table prior to chimera removal. Please let me know, I'd like to try and follow this up to make sure the Nanopore workflow runs as expected.

thierryjanssens commented 2 years ago

Dear Anna,

thanks for your swift reply.

This error was actually preceded by another error, which reports an incorrect primer number file. As you could see above, I have diced not to use primer detection in this approach.

cat ./output/logs/countPrimerReads.log [1] "reading sample table from reporting/readNumbers.tsv" [1] "extracting read numbers" run1/1.fastq.gz run1/2.fastq.gz run1/3.fastq.gz run1/4.fastq.gz [1,] "run1" "run1" "run1" "run1" [2,] "1" "2" "3" "4" run1/5.fastq.gz run1/6.fastq.gz run1/7.fastq.gz run1/8.fastq.gz [1,] "run1" "run1" "run1" "run1" [2,] "5" "6" "7" "8" run1/9.fastq.gz run1/10.fastq.gz run1/11.fastq.gz run1/12.fastq.gz [1,] "run1" "run1" "run1" "run1" [2,] "9" "10" "11" "12" Error in write.table(sampleTab, snakemake@output[[1]], sep = "\t", quote = F, : unimplemented type 'list' in 'EncodeElement' Execution halted

This may be the primary cause of the problem. When I restart the dadasnake pipeline, other errors pop up. There are sequence data in my fastq files, since multiQC produces a report.

My sample table looks like:

sample library r1_file 1 1 barcode01.fastq.gz 2 2 barcode02.fastq.gz 3 3 barcode03.fastq.gz 4 4 barcode04.fastq.gz 5 5 barcode05.fastq.gz 6 6 barcode06.fastq.gz 7 7 barcode07.fastq.gz 8 8 barcode08.fastq.gz 9 9 barcode09.fastq.gz 10 10 barcode10.fastq.gz 11 11 barcode11.fastq.gz 12 12 barcode12.fastq.gz

thierryjanssens commented 2 years ago

Are these setting appropriate for filtering nanopore data? (fastq files are empty after filtering):

filtering: trunc_length: fwd: 0 rvs: 0 trunc_qual: fwd: 0 rvs: 2 max_EE: fwd: Inf rvs: Inf minLen: fwd: 500 rvs: 0 maxLen: fwd: Inf rvs: Inf minQ: fwd: 10 rvs: 10 maxN: 1000 rm_phix: true trim_left: fwd: 0 rvs: 0

a-h-b commented 2 years ago

maybe you need to set minQ even lower?

thierryjanssens commented 2 years ago

OK thanks,

after having set the minQ lower and removed the dada pool option things are starting to work out. However, still I have an issue with the OTU table.

It does not contain the sample names and only one read per sample is assigned to the two OTU's detected:

Row.names X1 X2 OTU AAAATTTTATATTTTATTTTTTGGTATATGATCAGGTATATTAGGTTTTATCATAGTTATTAATTCGTATGGAACTTAGAATTACAGGGACATTATTAGTAATGACCAAATTTATAATAAATTGTTACTGCTCATGCTTTTGTTATAATTTTTTTATAGTTATACCAATTATAATTGGAGGATTTGGTAATTGATTAATTCCTTTAATGTTAGGGTCACTCTGATATGGCTTTCCCTCGAATAAATAATATAAGATTTTGATTATTAATTCCTTCGTATTTATTAATTGTTAGAAGTTTAATAAATTCAGGTGTTGGTACAGGATGAACAGTTTATCCTCCTTTATCTTTAACTTTAGGACATAAGGGGGTTGTTGTAGATTTTGCTATTTTTCCTTTACATTTAGCGGGTATTTCTTCAATTATGGGAGCTATCAATTTTATTAGTACTATTTTTAATATACGATGTTTTAATGTTAAAATAGATCAAATTTCATTATTAATTTGATCTGTATTGAGTACAACGATTTTATTATTATTATCTCTACCAGTATTAGCAGGTGCTATTACTATATTATTAATAACTGATCGTAATTTAAATACTACTTTTTTGATTTTGCAGGAAATTGGTGGTGATCCAATTTTATCAACATTTATTT 1 0 OTU_1 AAAAAAAAAAAAAAAAAAAAAAGACTTCTATTTTTGTTTTCGTTGTAGGCTGCTATAATTGGGTCTTCTATAAATTATAATTATTCAGATTGAACTTTCTCAAGAGGGGTGTATTATTTTATATACAGCCTGGGAAGAAACTATTGATGCAGGGGTATAATGTTATTGTTACTTCTCATGCTGTAAACAGTTTTTTATAGTGATACCTGCAATAATTGGTAGGTTTGGAAATTGGATTTAGTTCCTATTATAATTAGAGCTCCTGATATAGCATTTCCTCTATTATGAATAATATAAGTTTATAGTTGTGCCTCCTTCTCTGATTTTTTGCTGTTTTCTTTAATAAATTATCTCAGACAGGTACAGGTGAACTATTTATCCTCCTTTGTCAAATTTTGTCTATCGTAGGTCTATTAGGTAGATTTAGTTATTTTTGATTTACATTTAGCGGGGATTTCTTCTATTATAGGGGCTATTAACTTTATTAGGACTATTTTAAATATGCGTCCTAAAAATGAAGATAGAGCAGGTTCCGTTGTTTGTTTGATCTGTTTAATTACTGCTATTTTTGCTACTGCTTTCTTTACCTGTTTTAGCTGGGGCTATTTGTCTTCTTAGAGATCAGAAATTTTAATACAAGATTTTTGATCAGGAGGAAGGATCCTATTTGTATCGCATTTATTT 0 1 OTU_2

What is strange that only a single read comes through the merged and tabled steps...?

sample library run r1_file reads_raw_r1 reads_primers reads_filtered reads_merged reads_tabled reads_chimera_checked 1 1 run1 barcode01.fastq.gz NA 60001 3971 1 1 1 2 2 run1 barcode02.fastq.gz NA 47314 6950 1 1 1

a-h-b commented 2 years ago

so, the sample names are there (X1 and X2, because R does not like to call columns 1 or 2, as your samples are called.

the other thing is a bigger problem. You'll have to find good settings for the dada options (see https://rdrr.io/bioc/dada2/man/setDadaOpt.html - most of them are implemented in dadasnake). Unfortunately, I can't really help you there. The data that I worked with when setting this up was from an experiment like https://academic.oup.com/gigascience/article/7/12/giy140/5202451 . I have no clue if you can find dada2 settings that work with raw Nanopore reads.

If you do, please let me know, how you configured it.

Best wishes - A

ccastilloi commented 2 years ago

so, the sample names are there (X1 and X2, because R does not like to call columns 1 or 2, as your samples are called.

the other thing is a bigger problem. You'll have to find good settings for the dada options (see https://rdrr.io/bioc/dada2/man/setDadaOpt.html - most of them are implemented in dadasnake). Unfortunately, I can't really help you there. The data that I worked with when setting this up was from an experiment like https://academic.oup.com/gigascience/article/7/12/giy140/5202451 . I have no clue if you can find dada2 settings that work with raw Nanopore reads.

If you do, please let me know, how you configured it.

Best wishes - A

Hello, I have the same problem with OTUs table, It only has one OTU assigned to each sample. Could you resolved this issue?

Regards,