Bioconductor / Biostrings

Efficient manipulation of biological strings
https://bioconductor.org/packages/Biostrings
57 stars 16 forks source link

Moving seqinfo and seqinfo<- and importing GenomeInfoDb #49

Closed LiNk-NY closed 3 years ago

LiNk-NY commented 3 years ago

Hi Hervé @hpages , I've made changes to move these methods from the BSgenome package to here. I also included the getSeq,XStringSet-method. Please review and provide feedback / commentary. Thanks! -Marcel

hpages commented 3 years ago

Thanks Marcel.

I also included the getSeq,XStringSet-method

Did you have to? Can the first PR focus on adding the dep on GenomeInfoDb and defining the seqinfo and seqinfo<- methods like we discussed the other day? Maybe it doesn't sound like a big deal but I'm nervous about the introduction of the GenomeInfoDb dep, especially this close to a release, so I'd rather proceed with 2 or 3 well focused small steps, and let a full build system run after each step before we move to the next one. Thanks again.

LiNk-NY commented 3 years ago

Hi Hervé, @hpages You're right. I've moved the getSeq changes to another branch and updated this one. Please see the Files changed. Thanks, Marcel

hpages commented 3 years ago

Hi Marcel,

GenomeInfoDb needs to be in Depends rather than in Imports, otherwise:

library(Biostrings)
x <- DNAStringSet(c(A="TTTCATAGG", B="AA"))
seqinfo(x) <- Seqinfo(c("a", "b", "c"), 11:13)
# Error in Seqinfo(c("a", "b", "c"), 11:13) : 
#   could not find function "Seqinfo"

Also, the user should not be allowed to set a random Seqinfo object on x (like I'm trying to do above). The seqnames() of the supplied Seqinfo object value should be identical to names(x) (or to as.character(seq_along(x)) if x is unnamed), and unname(seqlengths(value)) should be identical to width(x). No restrictions on isCircular(value) or genome(value): these are allowed to be anything.

Finally the arguments of the seqinfo<- generic are:

> args(`seqinfo<-`)
function (x, new2old = NULL, pruning.mode = c("error", "coarse", 
    "fine", "tidy"), value) 
NULL

and so all these arguments should also appear in the method definition. Right now new2old and pruning.mode are silently ignored:

library(GenomeInfoDb)
seqinfo(x, new2old=3:1, pruning.mode="tidy") <- Seqinfo(c("a", "b", "c"), 11:13)  # this works!

but we don't want that. Since we don't support these arguments at the moment, the seqinfo<- method should raise an error if the user tries to supply them. Please use if (!is.null(new2old)) instead of if (!missing(new2old)) to detect this situation (using the latter has caused me problems in the past in some rare situations). In fact there's no need to check the pruning.mode argument at all because we're requiring that the supplied Seqinfo object contains entries for all the sequences in x so we're not allowing pruning in the first place. In other words, it's ok to ignore the pruning.mode argument for now.

Thanks, H.

LiNk-NY commented 3 years ago

Thanks Hervé, I made the changes as requested and included unit tests. Best, Marcel

hpages commented 3 years ago

Hey Marcel,

Sorry for the delay. Thanks for the changes and for adding unit tests.

Please rename check.replace.seqinfo -> test_replace_seqinfo. Right now the function is ignored. Like most packages using RUnit for unit tests, Biostrings' unit tests follow the convention that test functions must be prefixed with test_. Other functions can be defined but they will not be called directly as part of the test suite. Note that this convention is implemented in BiocGenerics:::testPackage() which Biostrings/tests/run_unitTests.R calls to run the tests.

Right now, the test results are:

RUNIT TEST PROTOCOL -- Tue May 11 22:45:04 2021 
*********************************************** 
Number of test functions: 39 
Number of errors: 0 
Number of failures: 0 

1 Test Suite : 
Biostrings RUnit Tests - 39 test functions, 0 errors, 0 failures
Number of test functions: 39 
Number of errors: 0 
Number of failures: 0 

After you rename check.replace.seqinfo, the number of test functions should be 40.

Thanks!

LiNk-NY commented 3 years ago

Thanks Hervé! I've fixed the name of the function and ran the tests:

RUNIT TEST PROTOCOL -- Thu May 13 15:09:43 2021 
*********************************************** 
Number of test functions: 40 
Number of errors: 0 
Number of failures: 0 

1 Test Suite : 
Biostrings RUnit Tests - 40 test functions, 0 errors, 0 failures
Number of test functions: 40 
Number of errors: 0 
Number of failures: 0 
hpages commented 3 years ago

Excellent, thanks! Let's merge this and see what happens on the builds (won't be reflected before Saturday's report though).

H.