Closed iromeo closed 5 years ago
It ignores them because of default GenomeQuery
chromosomes filtration in get
method by name
private val MAPPED_CHRS_PATTERN = "chr[0-9a-tv-zA-TV-Z]+[0-9a-zA-Z]*".toRegex()
As of SPAN-0.8.0.4533 ignored chromosomes are logged in output:
[Nov 23, 2018 19:46:44] Ignored chromosomes /Users/oleg/work/galaxy/database/jobs_directory/000/25/working/mm10.chrom.sizes: chr5_JH584299_random, chrX_GL456233_random, chrY_JH584301_random, chr1_GL456211_random, chr4_GL456350_random, chr4_JH584293_random, chr1_GL456221_random, chr5_JH584297_random, chr5_JH584296_random, chr5_GL456354_random, chr4_JH584294_random, chr5_JH584298_random, chrY_JH584300_random, chr7_GL456219_random, chr1_GL456210_random, chrY_JH584303_random, chrY_JH584302_random, chr1_GL456212_random, chrUn_JH584304, chrUn_GL456379, chr4_GL456216_random, chrUn_GL456393, chrUn_GL456366, chrUn_GL456367, chrUn_GL456239, chr1_GL456213_random, chrUn_GL456383, chrUn_GL456385, chrUn_GL456360, chrUn_GL456378, chrUn_GL456389, chrUn_GL456372, chrUn_GL456370, chrUn_GL456381, chrUn_GL456387, chrUn_GL456390, chrUn_GL456394, chrUn_GL456392, chrUn_GL456382, chrUn_GL456359, chrUn_GL456396, chrUn_GL456368, chrM, chr4_JH584292_random, chr4_JH584295_random
Fixed as of version 0.10.0
BAM file could include contigs for different chr haplo groups and other contigs, e.g. like mentioned in
hg19.chrom.sizes
file:As far as I understand, SPAN simply ignores aligned reads in such contigs and doesn't do peak calling there. IMHO a general purpose peak caller should call peaks for all available contigs mentioned in BAM and chromosome sizes file
If we are going to fix it, don't forget about https://github.com/JetBrains-Research/epigenome/issues/1168. User-defined chromosome sizes file could slightly differ from our reference (e.g. some contigs are missing or contain extra contigs) so it shouldn't stop SPAN model from being loaded into JBR.