arq5x / scurgen

A tool for detecting patterns in genomic data with space filling curves
9 stars 4 forks source link

How to interpret chrom="genome" #31

Open daler opened 11 years ago

daler commented 11 years ago

Currently if chrom="genome" is specified when creating a HilbertMatrix, pybedtools.chromsizes() is used to get the chromosomes for the assembly, and all chromosomes are used.

Some options:

  1. Use every chrom in the assembly
    • not ideal, since axes will be created, e.g., for dm3's chrU and chrUextra, and hg19's chrUn* and chr*random chroms.
    • the upcoming HilbertGUI separate-axes-for-each-chrom refactor will have extra -- likely empty -- axes cluttering the fig
  2. Parse all input files to determine the set of chromosomes with data
    • also not ideal since this requires all input files to be parsed to do this before being parsed again for the HC creation.
    • it's possible that really only large BAMs would have a substantial time penalty, and for those we could just parse the header. But large BED/VCF/GFF would still need to be parsed.
  3. Require all chroms to be specified, all the time
    • this would be a pain
  4. Have a pre-determined "reasonable default" set of chroms for each assembly
    • this would solve most of the above problems, but would require deciding on the reasonable defaults and would need a mechanism for a user to either add/subtract from the default set or requiring a user to specify all chroms if they want something other than the default
arq5x commented 11 years ago

I prefer option four. This was part of the reasoning for using OrderedDicts in pbt. First, we get the chroms in the order we want them, and second, we could conceivably tag the default (autosomes and x,y) and optional chroms in pbt. This way, e HC grids could, by default, plot only the standard chroms, and optionally plot everything. The last option would be to have the user provide a list of chroms in the YAML config file.

daler commented 11 years ago

OK, sounds good. I'll add the default info to pbt.