TeselaGen / tg-oss

Teselagen Open Source modules
https://teselagen.github.io/tg-oss/
MIT License
40 stars 17 forks source link

Displaying origin-wrapping features #35

Closed manulera closed 10 months ago

manulera commented 1 year ago

Hello @tnrich

Happy to have a go at this one myself if you agree, and if you give me some guidelines on where to start.

Apparently, for origin-spanning features in circular DNA, the syntax from NCBI is as follows (this is from an NCBI genome):

     gene            complement(join(490883..490885,1..879))
                     /locus_tag="NEQ001"
     CDS             complement(join(490883..490885,1..879))
                     /locus_tag="NEQ001"
                     /note="conserved hypothetical [Methanococcus jannaschii];
                     COG1583:Uncharacterized ACR; IPR001472:Bipartite nuclear
                     localization signal; IPR002743: Protein of unknown
                     function DUF57"
                     /codon_start=1
                     /transl_table=11
                     /product="NEQ001"
                     /protein_id="AAR38856.1"
                     /translation="MRLLLELKALNSIDKKQLSNYLIQGFIYNILKNTEYSWLHNWKK
                     EKYFNFTLIPKKDIIENKRYYLIISSPDKRFIEVLHNKIKDLDIITIGLAQFQLRKTK
                     KFDPKLRFPWVTITPIVLREGKIVILKGDKYYKVFVKRLEELKKYNLIKKKEPILEEP
                     IEISLNQIKDGWKIIDVKDRYYDFRNKSFSAFSNWLRDLKEQSLRKYNNFCGKNFYFE
                     EAIFEGFTFYKTVSIRIRINRGEAVYIGTLWKELNVYRKLDKEEREFYKFLYDCGLGS
                     LNSMGFGFVNTKKNSAR"

Basically, they use complement(join(490883..490885,1..879)) instead of complement(490883..879), which is what you would get if you created this feature in OVE, and what you often get from SnapGene files and files from AddGene. Biopython, the python library, adheres to the NCBI requirements, see this issue.

Maybe the OVE library should interpret the file below as if it was 14..2 (check if join features are consecutive). That's what you get when you open the file in either SnapGene or Benchling. Let me know what you think

LOCUS       pj5_00001                 14 bp    DNA     circular SYN 10-OCT-2023
DEFINITION  .
ACCESSION   pj5_00001
VERSION     pj5_00001
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
COMMENT     teselagen_unique_id: 5adf735aa1811801e17d8aac
FEATURES             Location/Qualifiers
     misc_feature    join(14,1..2)
                     /label="hello"
ORIGIN
        1 aaaaaaaaaa aaaa
//
tnrich commented 1 year ago

Hi @manulera sorry for the delayed response!

Just to make sure I understand, you're asking to have the the following genbank feature misc_feature join(14,1..2) come in as a single feature with {start: 14, end:2 } instead of as a single feature with 2 distinct locations of {start: 14, end:14} and {start: 1, end: 2}.

Is that right?

Another question I have - is that how all join()'s work if they aren't separated by at least one base pair?

Thanks!

manulera commented 1 year ago

Just to make sure I understand, you're asking to have the the following genbank feature misc_feature join(14,1..2) come in as a single feature with {start: 14, end:2 } instead of as a single feature with 2 distinct locations of {start: 14, end:14} and {start: 1, end: 2}. Is that right?

Yes, that's it.

is that how all join()'s work if they aren't separated by at least one base pair?

I am not sure, the origin wrap is the only use-case I can think of to create such a join location. I tried a feature join(10..11,12..14) in Benchling and Snapgene to see what they do:

In summary, I think it should only merge the fragments of the join in the case that the join happens exactly at the origin (last base / first base)

manulera commented 1 year ago

As I said, I am happy to have a go at this one, if you give me some guidance.

manulera commented 11 months ago

Hi @tnrich just following up on this. Would you still be happy to accept a contribution on this?

tnrich commented 11 months ago

Hi @manulera yep still happy to accept a PR on this one. I think this would be in the parse feature location code or nearabouts.