Closed FelixErnst closed 5 years ago
Hi @FelixErnst ,
I think that as.integer()
was intended to be used on XStringQuality objects of length 1 only. What could it return on an XStringQuality object of arbitrary length? Making it return an IntegerList object or ordinary list would be wrong because as.integer()
should always return an integer vector, that is, an atomic vector.
When your XStringQuality object is of arbitrary length, you should coerce to IntegerList instead:
library(Biostrings)
pq <- PhredQuality(c("*+,-./", "0123456789:;"))
as(pq, "IntegerList")
# IntegerList of length 2
# [[1]] 9 10 11 12 13 14
# [[2]] 15 16 17 18 19 20 21 22 23 24 25 26
Starting with Biostrings 2.51.2, calling as.integer()
or as.numeric()
on an XStringQuality of length != 1 is now an error. The error message suggests to coerce to IntegerList or NumericList instead:
as.integer(pq)
# Error in .BStringSet2integer(x, scale) :
# as.integer()/as.numeric() can be used on a PhredQuality object of length 1 only.
# Coerce to IntegerList or NumericList (with 'as( , "IntegerList")'
# or 'as( , "NumericList")') when the object is of arbitrary length.
Hope that works for you. H.
PS: If you're using BioC 3.9 (the current devel version of Bioconductor, requires R 3.6), Biostrings 2.51.2 will become available via BiocManager::install()
in the next 24h or so.
Hi @hpages,
that seams reasonable. I didn't think about the specific return type at all. My first instinct was to use the as.integer
function and I didn't think about coercing it to an IntegerList
, which of course works fine.
However, just to bring it up: I find this a bit inconsistent. The XStringQuality
objects are derived from XStringSet
. Shouldn't this mean, that they behave like a List
as well, since they derive from it? Apart from as.character
this would require a list like object to be returned to be able to hold the data. I would argue, that an XStringSet
of length of 1 does not require such a special way of handling it.
(Am I right to assume, that this is just a shortcut for implementing a XStringQuality
object derived from the XString
class? I don't think as.integer
is used internally either, is it?)
Hi @FelixErnst
Yes XStringQuality derives from List but what do you mean exactly when you say that an XStringQuality object should behave like a List? This is a pretty general statement and I agree with it. But I'm not sure what specific operation on a XStringQuality object do you think violates that statement. Are you arguing that as.integer()
should return a list or IntegerList object? Unfortunately that is not an option. This would violate the very strong and universal expectation that as.integer()
ALWAYS returns an atomic vector.
Regards, H.
Hi @hpages
Now from your comments, it became clear that the result type of as.integer
is the main focus and should behave as expected, which it does now (I learned something :-) ).
With this in mind I noticed after your initial comment , that as.integer
should only work on a QualityString
, but not on a QualityStringSet
. The first class name is of course taken, potentially for historic reasons, and in the context of all the other classes in Biostrings
QualityString
should be named QualityStringSet
.
This is what I meant with inconsistency. Maybe I am wrong, but the implementation of as.integer
is misleading, since it should not be implemented in the first place for a list. as.integer(list(c("1","2")))
does not work, but is basically the equivalent of as.integer(PhredQuality("!!!!"))
.
I am not sure why the "correct" QualityString
class derived from BString
was never implemented and the name was recycled for a class which should actually be named QualityStringSet
. There is probably a good reason, but I currently don't see it, which leaves the inconsistency with the special treatment for length = 1
.
Best, Felix
These are good points.
Note that there are no QualityString or QualityStringSet classes, only an XStringQuality class which extends BStringSet. Yes the name of the XStringQuality class is misleading. It should probably have been QualityStringSet or QStringSet. And yes the original developer of this class should probably have implemented the QString class first and build on top of that. But that's not what he did and we have to live with his early choices. That includes living with the fact that as.integer()
works on an XStringQuality object of length 1 (when it should have worked on QString objects only and for QStringSet objects the user would have had to always use coercion to IntegerList, even when the object has length 1).
There are other inconsistencies/incongruities with these classes but that doesn't mean we should add more by allowing as.integer()
to work on XStringQuality objects of arbitrary length. Rather than trying to fix the XStringQuality classes and subclasses I think that at some point we should rather start fresh by implementing the QString and QStringSet classes, and their subclasses PhredQString/PhredQStringSet, SolexaQString/SolexaQStringSet, and IlluminaQString/IlluminaQStringSet.
Unfortunately I don't have much time to dedicate to the Biostrings package these days. Note that the package actually needs a major overhaul and the addition of long wanted features especially in the pairwiseAlignment()
department but not only (other long wanted features are restriction of the letters used in an AAString object, implementation of vmatchPDict()
, and more...). I already talked with Eric Wright about this (Eric is the maintainer of the DECIPHER package which relies heavily on Biostrings). The good news is that Eric is willing to provide some resources to help this happen but I still need to make a detailed list (possibly with roadmap) of what needs to happen. I'll try to put something on Biostrings' built-in GitHub Wiki soon.
H.
I submitted a package which relies heavily on the Biostrings package and there will probably another contribution in a month or so, which will intersect closely with the Biostring package.
https://github.com/Bioconductor/Contributions/issues/962
So, if I can be of help during the overhaul please let me know.
Edit: the issue probably can be closed
Thx for your contribution (I'll take a look at it) and for offering to help with the overhaul. -- H.
Hi,
there is a bug causing a warning during conversion of a XStringQuality object containing multiple sequences to integer values. This is probaly due to
charToRaw
not being vectorized.If this behavior unintended or am I using it wrongly?
Kind regards Felix
PS: If this is really unintended behavior, a fix for this would concern line 54 of R/XStringQuality-class.R only. A change from
to
or
I didn't do any benchmarking.