AliTVTeam / AliTV

Visualize whole genome alignments as linear maps
https://alitvteam.github.io/AliTV/d3/AliTV.html
MIT License
69 stars 11 forks source link

Support for gff file upload for annotation #135

Open Anupmas opened 7 years ago

Anupmas commented 7 years ago

AliTV has Excellent interactive visualization capabilities. However, I am not sure how to generate the .yml file. It would be great if I could just upload my gff files to show the annotations. For later it would be great if there were multiple tracks within the chromsome so that overlapping data could be visualized. For example intron-exon strcture of gene, or conserved domains in a gene.

iimog commented 7 years ago

Hi @Anupmas, thanks for the feedback. Just to fully understand what you would like to do: Do you want an easier method to add annotations to a visualization via the perl script? Or do you want to upload the gff files to the final visualization directly (in the web browser)? In principal multiple tracks are already supported although overlapping tracks are not handled perfectly.

Anupmas commented 7 years ago

Hi @iimog, Thank you so much for your attention. ATV is really great, interactive, beautiful, and easy to use. I loved it. Currently I am comparing some BACs against the sorghum genome. Below I have provided more detail of what I am doing and why some additional features would help me.

  1. It would be great to be able to visualize part of the sequence, e.g., coordinates 500,000 to 600,000 of Chr1 of an organism. I saw that someone already requested this feature so I did not mention it.
  2. It would be great to have stackable separate tracks to see overlapping features. For example, a gene may have an transposable element insertion within its intron. Also, both the gene and transposable element may have conserved domains.
  3. The default colors for a features class (e.g, gene, CDS etc) are great. However, I would like to be able to easily customize the visualization of genomic features. The easiest for me would be to append the color, shape and track information in the last column of gff files (e.g., “color=red;shape=rectangle;track=1”) and then upload the gff file directly to the web browser.
  4. Also, I am not sure how the LASTZ alignments are generated by ATV internally. But it would be great if the user can supply a soft masked sequence and chose to use lowercase softmasking to generate alignments. For eukaryotic genomes, transposable elements generate a lot of noise and lowercase softmasking of repeats is incredibly useful.

Please let me know if I was not clear.

iimog commented 7 years ago

Cool, I'm happy that you like AliTV. Regarding your feature requests: they all sound like good and reasonable extensions to AliTVs capabilities.

  1. I added your +1 to that issue. If you just want to zoom in to the existing alignment you can do this using the zoom feature (just drag a box around the region of interest using the mouse, then wait a little), however this does not re-calculate the alignment for that specific region.
  2. Excellent point. We thought about this as well. What should be relatively easy to implement is adding a z-value for features to define which ones are drawn on top of other ones. Another possibility is to define the offset from the bottom of the chromosome and a height. Both would need to be implemented but should be possible without too much trouble. Which one would be more important to you?
  3. This is also a really nice idea. This would require quite a bit of coding but could be added more or less independently from the rest of the code. One challenge here is that the original IDs of the sequences might be changed by AliTV (in order to make them unique and valid for lastz). So the gff would need to use the new IDs or we would have to do the mapping in js (@greatfireball what do you think?)
  4. By default lastz uses lowercase masking except when the [unmask] modifier is added to the input file name. @greatfireball do we add this modifier by default or do we transform to uppercase ourselves? I agree that this should be configurable.
Anupmas commented 7 years ago

Thank you @iimog,

  1. Thanks for adding the +1 on my behalf.
  2. I prefer the second option i.e., "to define the offset from the bottom of the chromosome and a height". My reason for not preferring z-value (please let me know if I have interpreted the z-value all wrong!!) is that this will generate stacked graph where similar features may not be at the same height when stack depth varies.
  3. I don't mind replacing the IDs in gff myself, so long as the program provides me a file containing old names and new names. Some other users might prefer a more handsoff approach. So, its your call.
  4. Thanks for checking. It would be great if lowercase masking is used in ATV for the LASTZ run, and the handling of lowercase sequence is is described in the manual.