Ruitulyu / KAS-Analyzer

New computational framework to process and analyze KAS-seq and spKAS-seq data.
MIT License
10 stars 4 forks source link

Peaks in peaks bed files can extend beyond the length of the chromosome #19

Open ecroot opened 11 months ago

ecroot commented 11 months ago

Describe the bug The peaks .bed files created by peakscalling can contain regions (peaks) that extend a few basepairs beyond the length of the chromosome as listed in chrom.sizes.bed. This causes errors that prevent the multiBigwigSummary call within KAS-Analyzer diff from running successfully when using peaks.bed files as custom region files.

I suspect that this may only affect hs1/t2t, because blacklists for the other genome builds will exclude the telomeres from analysis, meaning that peaks won't be identified at the ends of chromosomes for these other builds.

To Reproduce Call peaks on a t2t dataset, and use the output as a custom regions file for diff.

Expected behavior The regions in a bed file should not extend past the ends of a chromosome, because this causes errors when trying to use the file.

It would be useful for KAS-Analyzer to automatically check that the regions in bed files that it creates do not extend past the values given in the relevant chrom.sizes.bed files, and fix the values (with a suitable warning to the user) if this is discovered.