TeselaGen / tg-oss

Teselagen Open Source modules
https://teselagen.github.io/tg-oss/
MIT License
39 stars 18 forks source link

OVE - display correct aminoacid sequence in CDS with introns #69

Closed manulera closed 4 months ago

manulera commented 5 months ago

Hi @tnrich, it would be great if the aminoacid sequences in CDSs would account for introns. For example, in the following sequence from genbank (ase1.txt) there is a CDS with an intron:

     CDS             join(1174..1597,1645..3416)
                     /label="ase1"
                     /locus_tag="SPOM_SPAPB1A10.09"

It would be nice if this was accounted for in the displayed aminoacid sequence:

Screenshot 2024-05-10 at 12 21 59

For reference, this is how it looks in SnapGene and Benchling.

Screenshot 2024-05-10 at 12 23 28 Screenshot 2024-05-10 at 12 22 15

Happy to propose a fix via PR if you point me to the right place!

manulera commented 4 months ago

Hi @tnrich any thoughts on this and #70? As I said, I would be happy to make a PR if you point me on where to start

tnrich commented 4 months ago

Hi @manulera sorry for the delay. Yes I think this could be a good one for you to try to tackle if you'd like! You'll basically need to look into the Translations /index.js file and see if you can update the rendering to leave out the appropriate joins

image

The feature appears to be drawing based on the "locations" array being passed to it. I think a similar logic will need to be added for the AA to be drawn based on the locations array and not the start/end of the AA range.

Might also need to do some logic around here:

image

I think just splitting the translation into multiple pieces for the different locations shouldn't be too tough, the tricky part will be getting the translation to be continuous across the joins and not "restart" for each location.

Good luck and let me know how it goes! 😄

tnrich commented 4 months ago

Closing as this has been merged and published! Lemme know if it works for you @manulera !

manulera commented 4 months ago

Amazing! Thank you so much. This is quite important when displaying genomic sequences.

There were couple of questions in https://github.com/TeselaGen/tg-oss/pull/76#issue-2333200855 in the "Unrelated to the implementation" section, in case you want to have a look. They are not important for my use-case, but I think they could lead to bugs in the future.