Bioconductor / Biostrings

Efficient manipulation of biological strings
https://bioconductor.org/packages/Biostrings
54 stars 16 forks source link

readAxt() and DNAStringSet() are unable to keep lowercase sequences #63

Closed Abrar2652 closed 2 years ago

Abrar2652 commented 2 years ago

readAxt() and DNAStringSet() functions automatically convert the lowercase (repetitive) sequences to the uppercase which produces wrong outcomes in research and many papers have already been published without knowing this internal fault of these functions.

hpages commented 2 years ago

I don't know what readAxt() is (Biostrings has no such function).

DNAStringSet() and DNAString() behave as intended and as documented, which is to return DNA sequences, that is, sequences made of letters from DNA_ALPHABET:

> DNA_ALPHABET
 [1] "A" "C" "G" "T" "M" "R" "W" "S" "Y" "K" "V" "H" "D" "B" "N" "-" "+" "."

No lowercase letters here.

Repetitive sequences in the Biostrings/BSgenome framework are handled via "masks". See the "Efficient genome searching with Biostrings and the BSgenome data packages" vignette in the BSgenome package for more information about masks and masked genome sequences.

H.