Closed Sebastian-Mynott closed 5 years ago
What is the output of head -n4 mysrr_file.fastq
(in the shell)?
What command did you use to convert from sra format to fastq? i.e. the fastq-dump
arguments.
Aha! I downloaded the files using package SRAdb getSRAfile(SRAccessions, sra_con, fileType = 'fastq' )
which gave me a list of .fastq.gz files so I didn't think I'd need fast-dump
.
the output of head -n4 mysrr_file.fastq
gives me this:
@SRR7758019.1 1/1 GCGGTAATTCCAGCTCCAATAGCGTATATTAAAGTTGTTGCGGTTAAAAAGCTCGTAGTTGGATTTCTGCTGAGGACGACCGGTCCGCCCTCTNNNNNNNNNTNNNNCTCGGCNTTGGCATCTTCTTGGGGAACGTNANTGCACTTGACTGTGTGGTGCGGTATCCAGGACTTTTACTTTGAGGNNNNNNNNGTGNNNCAANCNGGCTTACGCCTTGAATACATTAGCATGGAATAATAAGATAGGACCTTGGTTCTATTTNNTTGGNNNNNNNNGCTGAGGTNATGATTACTAGGGATAG + CCCCCGGGGGGGGGEGGFGGGGGGGGGGGGGGFGFGGGGFGGGGGGGGGGGGFGDFFFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG#########:####::DFGG#:BFGGGGGGGGGGGGGGGFGGG#:#:BFFGGGGGGFGGGGFGGGGGGGGGGGGGG7FGGGGGGGGGGFGG########56>###66=#6#6*;CFCGFGGGGGGFFGGGGGGGGDFG0776CAF7FF?7+??FGG6CC?C5D?GGGG##228*########0--1<CG4#--(4;A>4-5=FF**9*
Do I need to download the files again as SRA then convert to fastq?
Do I need to download the files again as SRA then convert to fastq?
I would at least try that on one file to see if that fixes this issue.
Hi, I wanted to re-open this because I am having a similar issue. I'm using paired-end sequence data sequenced on Illumina MiSeq and also downloaded from the NCBI SRA database. I downloaded the files originally as SRA files and then converted them to zipped fastq files (fastq.gz) using fastq-dump with a flag to make sure each sample had separate files for the forward and reverse reads.
I'm getting the same error when I run DADA2 on these files:
Error in (function (fn, fout, maxN = c(0, 0), truncQ = c(2, 2), truncLen = c(0, :
Couldn't automatically detect the sequence identifier field in the fastq id string.
Calls: filterAndTrim ... mclapply -> lapply -> FUN -> .mapply ->
The head of my one of my fastq files I'm reading into DADA2 looks like this: @SRR1191781.12854 12854 length=250 TTATTAATCCTATTGAACTATTTACGACATTAAACACACTGGAACATTTTTCCATTTTACAAATTTTTTTTTCAATATCATTTGCATAATCTAATTGGTCTTTAGGTTTATTAGCAGAGCCAGGTTTTATTCTAACTTGAATACCATTTCCACAAGTTACACTACATGGGGACCATTCAGTTGAAAGAGAATTTTGTATTGTCTTTAAATATTTTTCTATGTGCT + HHHHHHHHHHHFHHHHHHHHHHHHHGGFEFFHHHHHHHGHHHHHHHHHHHHHHHHHHHHHH5FGHHHGG>EGHHHHHHHHHHHGHBHHFHHHGDGHHHHHGGHHHHGHHFHHHGHFBFEGHFHH2BFGHGGHHHHHHHGGHHHHHHHHHHHGHHHHG1GHFHHGHHHHHEGGGGHHHHHGGHFHHBGGBCGHHHFHGGHGHFFHHHHHHGHHHHFGGGGGGFFGF
Do you know what might be going on and how I could fix this issue?
This error is because the original fastq id lines have been replaced by these SRA id lines, which filterAndTrim(..., matchIDs=TRUE)
doesn't recognize.
Do you need to use the matchIDs=TRUE
flag? If you don't, just remove it and everything should work fine.
Thank you for the quick reply. It looks like that solved the issue!
I'm having a similar problem with the SRA id lines, except i do require the matchIDs = TRUE flag. What then?
@d-callan Unfortunately I'm not sure if there is a solutions in that case. The original IDs are required to match the paired reads together if they are now in different orders.
thanks anyhow. I'm not convinced they are truly ordered differently. but im finding there are definitely differing number of read counts for forward and reverse. perhaps i can put together a script quickly to remove those reads which dont have a partner before passing to dada2 and see where that gets me. was mostly just hoping i might not have to..
thanks anyhow. I'm not convinced they are truly ordered differently. but I'm finding there are definitely differing number of read counts for forward and reverse. perhaps I can put together a script quickly to remove those reads which don't have a partner before passing to dada2 and see where that gets me. was mostly just hoping I might not have to..
Hi apologies for resurrecting an old thread, I was just wondering if you managed to find a solution to this? as I've found myself in the same situation
I am also meeting a similar issue with the head of fastq files. They were obtained by Illumina MiSeq, not downloaded from the NCBI SRA database. Is there something wrong with the head that can't be detected? @HWI-D00433:728:HHHKHBCX2:2:1101:8032:2352.1:N:0--D13a_C25.
Hi,
I'm looking at sequence data downloaded from the NCBI SRA database. When running filterAndTrim I get he following error:
After looking at the source code I tried inserting a dummy identifier, so instead of the identifier reading @SRR9876543.1 1/1, it would read @M012345:SRR9876543.1 1/1, but this didn't work.
Could you give me a suggestion how I can get around this?
Many thanks.