etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
547 stars 166 forks source link

Allow user-provided chromosome ordering #154

Open etal opened 8 years ago

etal commented 8 years ago

To reduce weird behavior and facilitate streaming algorithms, CNVkit automatically sorts chromosomes in a way that makes sense for mammalian, prokaryotic, and plant genomes. The yeast genome is Very Special, and CNVkit sorts its Roman numeral names very strangely indeed, starting with X and then lexicographically. This matters for plots in particular.

Options:

  1. Be like bedtools, and accept a genome file (.genome or SAM/BAM header) that specifies the chromosome order.
  2. Minimize sorting; maintain the input file's chromosome order as much as possible.
  3. Double down: attempt to identify & sort Roman numerals in chromosome names. (As a start, don't treat X specially if there are no numeric chromosome names.)
etal commented 7 years ago

See also: BioJulia/Bio.jl#291

It may be the most sensible to keep tabular files and data structures sorted in lexicographic order by chromosome, and only attempt "intuitive" sorting or allow user-provided chromosome order in the visualization commands.