Closed RvV1979 closed 3 years ago
I've drafted some code (https://github.com/gamcil/clinker/pull/56) to allow this which should land in the next release. Usage looks like:
clinker speciesA_chr01.gb speciesB_chr08.gb speciesC_contig003 --ranges chr01:2500000-4000000 chr08:1500000-2000000
Where ranges are scaffold_accession:start-end
.
Or from a file containing the ranges, ranges.txt
:
chr01:2500000-4000000
chr08:1500000-2000000
clinker speciesA_chr01.gb speciesB_chr08.gb speciesC_contig003 --ranges $(cat ranges.txt)
Since I support multi-record files being parsed for clusters containing multiple loci, I decided to keep it as a separate argument instead of adding it straight to the genome parsing.
Included in v0.0.20.
Thanks for your efforts to implement this feature. However, I tried and I do not seem to get it working in v0.0.20. When using the example files and specifying a range for A. burnettii excluding the non-syntenic genes from scaffold_377, there is no effect.
clinker *.gbk --ranges scaffold_377:7500-27649 -p
and
clinker *.gbk -p
give the exact same results.
Am I doing something wrong?
Thanks
For large genomes, one would often be interested in analyzing only a specific window of a chromosome while keeping the original coordinates. This would reduce runtime by excluding genes outside that window and simplify post-hoc manual adjustment of the plot.
It would therefore be great if it was possible to transmit the desired windows to be considered for analysis. For example, by calling something like e.g.
clinker speciesA_chr01.gb:2500000-4000000 speciesB_chr08.gb:1500000-2000000 speciesC_contig003 -p
Cheers