gamcil / clinker

Gene cluster comparison figure generator
MIT License
507 stars 66 forks source link

Feature request: setting sequence window #36

Closed RvV1979 closed 3 years ago

RvV1979 commented 3 years ago

For large genomes, one would often be interested in analyzing only a specific window of a chromosome while keeping the original coordinates. This would reduce runtime by excluding genes outside that window and simplify post-hoc manual adjustment of the plot.

It would therefore be great if it was possible to transmit the desired windows to be considered for analysis. For example, by calling something like e.g. clinker speciesA_chr01.gb:2500000-4000000 speciesB_chr08.gb:1500000-2000000 speciesC_contig003 -p

Cheers

gamcil commented 3 years ago

I've drafted some code (https://github.com/gamcil/clinker/pull/56) to allow this which should land in the next release. Usage looks like:

clinker speciesA_chr01.gb speciesB_chr08.gb speciesC_contig003 --ranges chr01:2500000-4000000 chr08:1500000-2000000

Where ranges are scaffold_accession:start-end.

Or from a file containing the ranges, ranges.txt:

chr01:2500000-4000000
chr08:1500000-2000000

clinker speciesA_chr01.gb speciesB_chr08.gb speciesC_contig003 --ranges $(cat ranges.txt)

Since I support multi-record files being parsed for clusters containing multiple loci, I decided to keep it as a separate argument instead of adding it straight to the genome parsing.

gamcil commented 3 years ago

Included in v0.0.20.

RvV1979 commented 3 years ago

Thanks for your efforts to implement this feature. However, I tried and I do not seem to get it working in v0.0.20. When using the example files and specifying a range for A. burnettii excluding the non-syntenic genes from scaffold_377, there is no effect. clinker *.gbk --ranges scaffold_377:7500-27649 -p and clinker *.gbk -p give the exact same results. Am I doing something wrong? Thanks