Closed orzechoj closed 8 years ago
Super cool! Any thoughts on using the genome reference config files to store the location for the blacklist file? Presumably it will always be the same for each species? So it could have its own reference type? eg:
@reference blacklist GRCh37 /path/to/genomes/Human/GRCh37/blacklist/ Human GRCh37
@reference blacklist GRCm38 /path/to/genomes/Mouse/GRCm38/blacklist/ Mouse GRCm38
Might be better than having to specify it as a param
?
I've had this functionality in the back of my mind for a while now anyway, could be good in other pipelines too.. Thanks!
Hi,
Haven’t thought much about this..
I guess black list files don’t change much for an individual genome. But I could also see cases where you might use this module to remove reads using other files, e.g from a few manually curated regions, from everything overlapping lncRNAvor something else.
But as long as it’s possible to use params to set other “black list” files, it might be a good a idea to have a default option in the config.
cheers, Jakub
Yup - that could definitely work, start by looking for the a param file and if that's not found look for a genome file instead (should just be a couple of extra lines?). I just prefer to keep species specific stuff out of pipelines where possible, so that they can be used for any organism.
Phil
Thanks again @orzechoj! I'll add the minor suggestions.
Added a module to remove read in blacklist regions (e.g. from ENCODE), which uses bedtools intersect. (Also raised the memory for cf_merge)