etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
547 stars 165 forks source link

Fix uniqueness check in intersect.by_shared_chroms() #581

Closed tskir closed 3 years ago

tskir commented 3 years ago

Closes #574. Thank you @dajana17 for reporting this!

The intended check was to see if the entire table and other only contain one chromosome each (and it's also the same chromosome). However, .is_unique actually does almost the opposite thing: it's true when all values in the chromosome column are unique. For example, it triggers for [chr1, chr2, chr3, chr4, chr5], while the intention was to trigger for [chr1, chr1, chr1, chr1, chr1].

This appears to be a very rare edge case, because in the real world data the list of chromosomes in both tables was probably never or very rarely unique. Actually, the list of FASTA contigs (the first table) is always unique, so only the BED file needs to have non-repeating chromosomes in order for the bug to trigger.

etal commented 3 years ago

I think I meant to type if table['chromosome'].nunique() == 1 and other['chromosome'].nunique() == 1: .... This will work, too. Thanks!