Closed LTLA closed 6 years ago
Hi Aaron, I added support for FASTQ files with variable-length reads in Biostrings 2.47.10. Let me know if you run into problems. H.
Sweet, thanks Herve.
Ah - whoops - now that I actually check it, though, I get:
X <- readDNAStringSet("blah.fq", format="fastq")
X
## A DNAStringSet instance of length 2
## width seq names
## [1] 12 ACTAGCGACACT READ1
## [2] 7 CTAGACT READ2
... so the quality scores haven't been pulled in.
readDNAStringSet()
has been ignoring the qualities so far. I'm currently working on changing this (as requested a long time ago by Erik Wright, the author of the DECIPHER package, but I never really had the time to get to this).
Hi Aaron @LTLA ,
This is now in Biostrings 2.47.11: https://github.com/Bioconductor/Biostrings/commit/9452ca25cd14af6ab8c5b520b74d0f7f6ba6d070
See also ?readDNAStringSet
.
H.
Hm. I'm still getting this with 2.47.11, using the blah.fq
example in the original post:
readDNAStringSet("blah.fq", format="fastq", with.qualities=TRUE)
## A DNAStringSet instance of length 2
## width seq names
## [1] 12 ACTAGCGACACT READ1
## [2] 7 CTAGACT READ2
Am I still missing some arguments?
The show
method for XStringSet objects does not display the metadata columns but they are here. See the examples in ?readDNAStringSet
(section "B. READ/WRITE FASTQ FILES").
Right. Perhaps it would be more intuitive to output a QualityScaledDNAStringSet
, especially as this can be directly used in further Biostrings functions.
I would need a way to know whether the qualities in the file must go in a PhredQuality, SolexaQuality, or IlluminaQuality object. I could add an extra argument (e.g. quality.type
) to readDNAStringSet()
to let the user tell me that. For now my preference is to keep readDNAStringSet()
low-level. It's easy enough for the user to do something like:
reads <- readDNAStringSet(filepath, format="fastq", with.qualities=TRUE)
quals <- PhredQuality(mcols(reads)$qualities) # or SolexaQuality(), or IlluminaQuality()
QualityScaledDNAStringSet(reads, quals)
I added this example to the man page: https://github.com/Bioconductor/Biostrings/commit/9e92e22257dd985710f6bb8cd105c99cd14b1531
Hope that works for you.
Okay, fair enough. Thanks Herve.
After giving it a 2nd thought, I decided to go for this: https://github.com/Bioconductor/Biostrings/commit/4ed20b48b3c64647638e46f0729301d152b39297
It provides the functionality you were looking for but via more specialized functions. readDNAStringSet()
and writeXStringSet()
remain low-level and I don't need to make their interfaces more complicated than they already are.
Sweet, I can see myself using this often, thanks.
It seems as if variable length reads in FASTQ files are not yet supported. For example, if I have a FASTQ file named
blah.fq
with the following contents:... and I run (on Biostrings version 2.47.9):
... I get the error:
Is support for variable-length FASTQ files on the radar? This is the typical case when dealing with Nanopore data. I know I can use the ShortRead package but it seems unnecessary to reach for a different package when there seems to be a perfectly suitable function in Biostrings.