gamcil / clinker

Gene cluster comparison figure generator
MIT License
507 stars 66 forks source link

Issue with multi-exon genes #35

Open RvV1979 opened 3 years ago

RvV1979 commented 3 years ago

It seems clinker does not work as intended when multi-exon genes are annotated with separate CDS instead of with join(). When this is the case, CDS exons are considered separately leading to spurious cluster groups and missing links to genes that are annotated using join(). See example output below for analysis of three test files that each comprise a 14-exon uncharacterized gene and a 2-exon glycosyltransferase gene. In test1 and test2 exons are annotated separately, in test3 they are annotated using join(), see attached files.

In standard output showing genes to scale (below) you see that the number of cluster groups is 15 instead of the expected two and that there is no link to the first gene in test3. test_scaled

When not showing genes to scale, you see that the spurious cluster groups coincide with separate exons rather than the entire genes. test_notscaled

Gene annotations with separate CDS are quite common in .gff3 files. It would therefore be great if clinker appropriately concatenated such annotations into the full-length CDS before analysis to avoid problems.

Thanks

test1.gb.txt test2.gb.txt test3.gb.txt

.