berman-lab / ymap

YMAP - Yeast Mapping Analysis Pipeline : An online pipeline for the analysis of yeast genomic datasets.
MIT License
6 stars 6 forks source link

Visualization too crowded for too many chromosomes #38

Closed vladimirg closed 7 years ago

vladimirg commented 8 years ago

Cyberlindnera fabianii has 50 scaffolds at the moment. Even after filtering out those which are of length <= 100,000, we still have 29 scaffolds. Visualization for them doesn't work well - see attached files (for both horizontal and vertical stacking).

fig cnv-snp-map 2 fig cnv-snp-map 1

darrenabbey commented 8 years ago

The position of each chromosome subfigure in the second figure is defined in a configuration file (in the genome directory). The configuration file is generated automatically. There is the opportunity for a more clever algorithm to better arrange the chromosomes such as to not have so much wasted white-space. This would provide more room for the chromosome name labels, perhaps enough to avoid the annoying overlaps in the main figure.

This would clean up some of the issues in the main figure, but wouldn't do much for the linear versions of figures. For those, adjusting the chromosome label strings for smaller chromosomes to take up less space would be helpful. In this case, something like ['Chr1', 'Chr2', 'Chr3', 'Chr4', 'Chr5', 'Chr6', 'Chr7', 'Chr8', 'Chr9', '10', '11', '12', '13', '14', etc. at reducing font sizes as needed.]. At some point, the pattern of the labels should be clear enough even if the smallest supercontigs are lacking labels at all.

vladimirg commented 8 years ago

Why not just increase the size of the figure? In the case of linear figures, we can just make another line (as many as needed), and in the case of the vertical stack, simply increasing the size should work.

darrenabbey commented 8 years ago

Increasing the image size would have the effect of reducing the visual size of all the labels, so would be a quick solution. I'm not sure if it would be the ideal solution, but it is definitely the ideal first solution to try.

darrenabbey commented 8 years ago

Moving smaller chromosome cartoons to a second line in the linear figure format would interfere with one of the main values of the figure - that of comparing different datasets by placing one figure right after the other on screen.

A work-around for such high-supercontig genomes might be to generate two linear figures. The first would have the set of larger chromosomes that comprises half the genome, leaving the second to hold the set of smaller chromosomes that comprise the other half. This would de-clutter the main linear figures, while still providing the users with the ability to view figures from different datasets for comparison.