Bioconductor / Biostrings

Efficient manipulation of biological strings
https://bioconductor.org/packages/Biostrings
57 stars 16 forks source link

readDNAStringSet fails with Legacy Mac (CR) line breaks #37

Closed FabianRoger closed 4 months ago

FabianRoger commented 4 years ago

I got a help request from someone running iOS 10.12. The person couldn't figure out why readDNAStringSet didn't load the correctly formatted fasta file. After some troubleshooting I found out that the file contained Legacy Mac (CR) line breaks which apparently aren't recognised as line breaks by readDNAStringSet.

The function runs without error or warning but the result is meaningless (1, 0-width sequence).

Is it possible to support these line breaks or raise an informative error?

thanks for the great package!

Fabian

hpages commented 4 years ago

I didn't know you could run R/Bioconductor on iOS. Note that this is not a platform that we support or intend to support. In case you meant macOS 10.12, please note that starting with R 4.0, R and R/Bioconductor packages are only supported on macOS 10.13 (High Sierra) and higher.

the file contained Legacy Mac (CR) line breaks which apparently aren't recognised as line breaks by readDNAStringSet.

Unfortunately CR line terminators break the most basic Unix tools like cat, more, wc, etc... They also break calls to the standard C library like fgets(), or to the zlib C library like gzgets(), both of which are used internally by readDNAStringSet(). So supporting these terminators would complicate readDNAStringSet()'s underlying C code significantly and would very likely introduce a slow down.

FabianRoger commented 4 years ago

I meant macOS 10.12. And I don't know how frequent the problem is, I just realized that it wasn't an easy to troubleshoot error (because no error was raised). Is there any option for checking for unsupported line-breaks and raising a warning? But I also understand if it's too much trouble for a possibly infrequent problem.

ahl27 commented 4 months ago

I'm going back through old issue reports--this is an issue we're unlikely to fix; as mentioned by Hervé, the current release ofBiostrings depends on R 4.0.0, which requires OSX >= 10.13. This issue should only be a problem on machines using legacy OS flavors. Happy to discuss more, but going to close this issue.

hpages commented 4 months ago

Thanks @ahl27. Just to clarify the current release of Bioconductor/Biostrings now depends on R 4.4 (was R 4.0 four years ago).