Bioconductor / BSgenome

Software infrastructure for efficient representation of full genomes and their SNPs
Clarify documentation of 'exclude' slot in a BSParams #1

Closed PeteHaitch closed 4 years ago

PeteHaitch commented 6 years ago

The exclude slot in a BSParams is documented as:

... a character vector with strings that will be used to filter out chromosomes whose names match these strings.

From that, I thought bsapply() was treated it as a string literal. In fact, bsapply() treats it as a regular expression:

l ended up spending a fair bit of time banging my head up against this in the example below.

The examples on ?bsapply suggest that treating it as a regular expression is the intended behaviour, and give a nice demonstration of when this behaviour is useful, so I think this is the correct behaviour. But perhaps the documentation could be updated to make this clearer?


# I was expecting to just get the matches for chr17 but got nothing!
bsp1 <- new("BSParams", 
            X = BSgenome.Hsapiens.UCSC.hg38, 
            FUN = matchPattern,
            exclude = setdiff(seqlevels(BSgenome.Hsapiens.UCSC.hg38), "chr17"))
bsapply(bsp1, pattern = "CG")
#> named list()

# Making it a regular expression gave me the desired result.
bsp2 <- bsp1
bsp2@exclude <- paste0("^", bsp1@exclude, "$")
bsapply(bsp2, pattern = "CG")
#> $chr17
#>   Views on a 83257441-letter DNAString subject
#> views:
#>              start      end width
#>       [1]    60054    60055     2 [CG]
#>       [2]    60141    60142     2 [CG]
#>       [3]    60168    60169     2 [CG]
#>       [4]    60201    60202     2 [CG]
#>       [5]    60210    60211     2 [CG]
#>       ...      ...      ...   ... ...
#> [1248324] 83245477 83245478     2 [CG]
#> [1248325] 83245632 83245633     2 [CG]
#> [1248326] 83246061 83246062     2 [CG]
#> [1248327] 83246281 83246282     2 [CG]
#> [1248328] 83247017 83247018     2 [CG]

Created on 2018-09-17 by the reprex package (v0.2.1)

hpages commented 4 years ago

This is clarified in BSgenome 1.55.4 (see commit 9783e421). Sorry for letting this sit in a corner for so long.