Bioconductor / Biostrings

Efficient manipulation of biological strings
https://bioconductor.org/packages/Biostrings
54 stars 16 forks source link

Error in reading fastq files #104

Closed apcristi closed 10 months ago

apcristi commented 11 months ago

I'm having issues with a dada2 pipeline I have run successfully in the past and I believe it has to be with Biostrings

When I try to compute the number of paired reads using the function below:

df <- data.frame()

loop through all the R1 files (no need to go through R2 which should be the same)

for (i in 1:length(fns_R1)) {

#use the dada2 function fastq.geometry
geom <- fastq.geometry(fns_R1[i])

#extract the information on the number of sequences and file name
df_one_row <- data.frame(n_seq = geom[1], file_name = basename(fns[i]))

#add one line to a data frame
df <- bind_rows(df, df_one_row)

}

Display the number of sequences and write data to a small file

knitr::kable(df)

Error in .Call2("fastq_seqlengths", filexp_list, nrec, skip, seek.first.rec, : reading FASTQ file ./fastq/TAN2101-CTD01-DNA-3_CGAAGG-KGFBK_L001_R1.fastq.gz: "+" expected at beginning of line 26171.

I have reinstalled Biostrings and R Studio and it keeps happening. I tried the script with old fastq files that have been analysed before and got the same error, so the problem should not be with the files. The pipeline I use was adapted from https://vaulot.github.io/tutorials/R_dada2_tutorial.html

hpages commented 11 months ago

Can we have access to the file? It's hard to help you if we can't reproduce the problem.

Also please show your sessionInfo().

Thanks

hpages commented 10 months ago

Are you planning to follow up on this @apcristi?

apcristi commented 10 months ago

Hi, sorry. Tracked the issue back and there was an update on the server that corrupted the files, downloaded the fastq again and now it works!

hpages commented 10 months ago

Glad you solved the problem. Thanks for letting me know.