The-Sequence-Ontology / SO-Ontologies

Collect of SO Ontologies
Creative Commons Attribution 4.0 International
92 stars 37 forks source link

Terms for transposons #488

Closed oushujun closed 4 years ago

oushujun commented 4 years ago

What is this request referring to?

487

Parent labels Synonyms about
SO:0000186 LTR/Copia RLC, Ty1, Copia, LTR-Copia LTR retrotransposon in the Copia superfamily
SO:0000186 LTR/Gypsy RLG, Ty3, Gypsy, LTR-Gypsy LTR retrotransposon in the Gypsy superfamily
SO:0000186 LTR/Bel-Pao RLB, Bel/Pao LTR retrotransposon in the Bel-Pao superfamily
SO:0000186 LTR/Retrovirus RLR Retrovirus with LTR structure
SO:0000186 LTR/ERV RLE, HERV Endogenous retrovirus with LTR structure
SO:0000189 DIRS   Dictyostelium intermediate repeat sequence
DIRS DIRS/DIRS RYD Dictyostelium intermediate repeat sequence superfamily DIRS
DIRS DIRS/Ngaro RYN Dictyostelium intermediate repeat sequence superfamily Ngaro
DIRS DIRS/VIPER RYV Dictyostelium intermediate repeat sequence superfamily VIPER
SO:0000189 Penelope RPP Penelop retrotransposon
SO:0000194 LINE/R2 RIR LINE superfamily R2
SO:0000194 LINE/RTE RIT LINE superfamily RTE
SO:0000194 LINE/Jockey RIJ LINE superfamily Jockey
SO:0000194 LINE/L1 RIL LINE superfamily L1
SO:0000194 LINE/I RII LINE superfamily I
SO:0000206 SINE/tRNA RST SINE superfamily tRNA
SO:0000206 SINE/7SL RSL SINE superfamily 7SL
SO:0000206 SINE/5S RSS SINE superfamily 5S
SO:0000182 Crypton DYC Crypton transposons contain a tyrosine recombinase, as do some phages, IS and DIRS-like retrotransposons, but lack an RT domain.
SO:0000208 DNA/Tc1-Mariner DTT, Tc1, Mariner, Stowaway, TcMar-Stowaway TIR superfamily Tc1–Mariner
SO:0000208 DNA/hAT DTA, Ac, Ds, Ac/Ds, hAT-Ac TIR superfamily hAT
SO:0000208 DNA/Mutator DTM, MuDR, Mu, MULE, MLE TIR superfamily Mutator
SO:0000208 DNA/Merlin DTE TIR superfamily Merlin
SO:0000208 DNA/Transib DTR TIR superfamily Transib
SO:0000208 DNA/P DTP, P-element TIR superfamily P
SO:0000208 DNA/PiggyBac DTB TIR superfamily PiggyBac
SO:0000208 DNA/PIF-Harbinger DTH, PIF, Harbinger, Tourist TIR superfamily PIF–Harbinger
SO:0000208 DNA/CACTA DTC, En, Spm, dSpm, CACTC, En-Spm, EnSpm, CMC-EnSpm TIR superfamily CACTA

Relevant Publications https://www.nature.com/articles/nrg2165 https://pubmed.ncbi.nlm.nih.gov/26709091/

Thanks, Shujun Ou

davidwsant commented 4 years ago

Hi Shujun,

Thank you for writing these out. Unfortunately, the definitions are going to need more information to be able to distinguish between terms. For instance, other than the name there is no true difference between LTR/Copia and LTR/Gypsy. Could you please update your table with full definitions? I tried going through the first paper listed, but I am not an expert in this field and do not trust that my definitions will be sufficient. Here is an example of the first three definitions I made:

Parent labels Synonyms Def
SO:0000186 (LTR_retrotransposon) LTR/Copia RLC, Ty1, Copia, LTR-Copia LTR retrotransposons in the Copia superfamily contain elements coding for specific proteins in this order: GAG, AP, INT, RT, RH. GAG is a structural protein for virus-like particles. AP is aspartic proteinase. INT is a DDE integrase. RT is a reverse transcriptase. RH is RNAse H.
SO:0000186 (LTR_retrotransposon) LTR/Gypsy RLG, Ty3, Gypsy, LTR-Gypsy LTR retrotransposons in the Gypsy superfamily contain elements coding for specific proteins in this order: GAG, AP, RT, RH, INT. GAG is a structural protein for virus-like particles, but in Gypsy this contains only the matrix protein. AP is aspartic proteinase. INT is a DDE integrase. RT is a reverse transcriptase. RH is RNAse H.
SO:0000186 (LTR_retrotransposon) LTR/Bel-Pao RLB, Bel/Pao LTR retrotransposons in the Bel-Pao superfamily contain elements coding for specific proteins in this order: GAG, AP, RT, RH, INT. GAG is a structural protein for virus-like particles. AP is aspartic proteinase. INT is a DDE integrase. RT is a reverse transcriptase. RH is RNAse H.

Of note, even with reading the paper, I do not see any clear distinction between the following pairs: Bel-Pao and Gypsy; Retrovirus and ERV; any of the DIRS; Jockey and L1; and most of the TIRs. Could you please make sure the definitions include enough information to distinguish between the terms ASIDE from using the names in the definitions.

Thank you,

Dave Sant

oushujun commented 4 years ago

Hi Dave,

Below is the updated table. Please let me know if anything is missing.

Parent labels Synonyms def Ref
SO:0000186 LTR/Copia RLC, Ty1, Copia, LTR-Copia LTR retrotransposons in the Copia superfamily contain elements coding for specific proteins in this order: GAG, AP, INT, RT, RH. GAG is a structural protein for virus-like particles. AP is aspartic proteinase. INT is a DDE integrase. RT is a reverse transcriptase. RH is RNAse H. https://www.nature.com/articles/nrg2165
SO:0000186 LTR/Gypsy RLG, Ty3, Gypsy, LTR-Gypsy LTR retrotransposons in the Gypsy superfamily contain elements coding for specific proteins in this order: GAG, AP, RT, RH, INT. GAG is a structural protein for virus-like particles. AP is aspartic proteinase. INT is a DDE integrase. RT is a reverse transcriptase. RH is RNAse H. https://www.nature.com/articles/nrg2165
SO:0000186 LTR/Bel-Pao RLB, Bel/Pao LTR retrotransposons in the Bel-Pao superfamily similar to LTR/Gypsy and Retroviridae. Mainly described in metazoan genomes. This superfamily contain elements coding for specific proteins in this order: GAG, AP, RT, RH, INT, and env (in the case of retroviruses). GAG is a structural protein for virus-like particles. AP is aspartic proteinase. INT is a DDE integrase. RT is a reverse transcriptase. RH is RNAse H. env is envelop protein. https://www.nature.com/articles/nrg2165
SO:0000186 LTR/Retrovirus RLR LTR retrotransposons in the retrovirus superfamily similar to LTR/Gypsy and Bel-Pao. Mainly described in vertebrate animals. This superfamily contain elements coding for specific proteins in this order: GAG, AP, RT, RH, INT, and env (in the case of retroviruses). GAG is a structural protein for virus-like particles. AP is aspartic proteinase. INT is a DDE integrase. RT is a reverse transcriptase. RH is RNAse H. env is envelop protein. https://www.nature.com/articles/nrg2165
SO:0000186 LTR/ERV RLE, HERV Endogenous retrovirus are abundant in the genomes of jawed vertebrates. Human ERVs (HERVs) are classified based on their homologies to animal retroviruses. Class I families are similar in sequence to mammalian Gammaretroviruses (type C) and Epsilonretroviruses (Type E). Class II families show homology to mammalian Betaretroviruses (Type B) and Deltaretroviruses(Type D). FClass III families are similar to foamy viruses. https://www.nature.com/articles/nrg2165
SO:0000186 LTR/unknown RLX LTR retrotransposon with uncertain classifications. It may contain coding elements including: GAG, AP, INT, RT, RH. GAG is a structural protein for virus-like particles. AP is aspartic proteinase. INT is a DDE integrase. RT is a reverse transcriptase. RH is RNAse H.
SO:0000194 LINE/R2 RIR R2 elements are non-long terminal repeat (non-LTR) retrotransposons that insert site-specifically into the host organism's 28S ribosomal RNA (rRNA) genes. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3256348/
SO:0000194 LINE/RTE RIT RTE-1 elements contain a domain with homology to the apurinic-apyrimidic (AP) endonucleases in addition to the previously identified RT domain. https://pubmed.ncbi.nlm.nih.gov/9729877/
SO:0000194 LINE/Jockey RIJ Jockey is a superfamily of non-LTR retrotransposons found only in Arthropoda. The full-length element is ~ 5 kb and contains two ORFs (open reading frames), ORF1 (568 aa) and ORF2 (916 aa), which encodes an apurinic endonuclease (APE) and a reverse transcriptase (RT). https://mobilednajournal.biomedcentral.com/articles/10.1186/s13100-019-0184-1
SO:0000194 LINE/L1 RIL Long interspersed element-1 (LINE-1) is found in the human genome, which contains ORF1 (open reading frame1, including CC, coiled coil; RRM, RNA recognition motif; CTD, carboxyl-terminal domain) and ORF2 (including EN, endonuclease; RT, reverse transcriptase; C, cysteine-rich domain). The L1-encoded proteins (ORF1p and ORF2p) can mobilize nonautonomous retrotransposons, other noncoding RNAs, and messenger RNAs. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4124830/
SO:0000194 LINE/I RII The LINE I superfamily is similar to the Jocky and LI superfamily, which encodes ORF1 (open reading frame 1) and ORF2 (including APE, Apurinic endonuclease; RT,Reverse transcriptase). The I superfamily encodes an RH (RNase H) domain downstream of the RT domain. https://www.sciencedirect.com/topics/biochemistry-genetics-and-molecular-biology/long-interspersed-nuclear-element
SO:0000194 LINE/unknown RIX Long interspersed element with uncertain classifications.
SO:0000206 SINE/tRNA RST Short interspersed elements originated from tRNAs. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3242629/
SO:0000206 SINE/7SL RSL Short interspersed elements originated from 7SL RNAs. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3242629/
SO:0000206 SINE/5S RSS Short interspersed elements originated from 5S rRNAs. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3242629/
SO:0000206 SINE/unknown RSX Short interspersed element with uncertain classifications.
SO:0000182 Crypton DYC Crypton is a superfamily of DNA transposons using tyrosine recombinase (YR) to cut and rejoin the recombining DNA molecules. https://mobilednajournal.biomedcentral.com/articles/10.1186/1759-8753-2-12
SO:0000208 DNA/Tc1-Mariner DTT, Tc1, Mariner, Stowaway, TcMar-Stowaway Terminal inverted repeat transposon superfamily with the Tc1 transposasse. Its activity creates a 2-bp (TA) target-site duplication (TSD). Stowaway is the non-autonomous element in this superfamily usually shorter than 600 bp. https://link.springer.com/chapter/10.1007%2F978-3-642-79795-8_6
SO:0000208 DNA/hAT DTA, Ac, Ds, Ac/Ds, hAT-Ac Terminal inverted repeat transposon superfamily first found in maize (the Ac/Ds elements). Members of the hAT superfamily have TSDs of 8 bp, relatively short TIRs of 5–27 bp and overall lengths of less than 4 kb. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1461711/
SO:0000208 DNA/Mutator DTM, MuDR, Mu, MULE, MLE Mutator is a superfamily of terminal inverted repeat (TIR) transposon. Mutator TIRs are usually long but are also highly divergent - sharing only terminal G…C nucleotides — or are absent. The length of the TSD (7-11 bp, usually 9 bp) remains probably the most useful criterion for identification. https://www.nature.com/articles/nrg2165
SO:0000208 DNA/Merlin DTE Terminal inverted repeat transposon superfamily DDE transposase. Elements in this superfamily creates 8-9 bp target-site duplication (TSD). https://www.nature.com/articles/nrg2165
SO:0000208 DNA/Transib DTR Terminal inverted repeat (TIR) transposon of the superfamily Transib contains the DDE motif, which is related to the RAG1 protein involved in V(D)J recombination61. https://www.nature.com/articles/nrg2165
SO:0000208 DNA/P DTP, P-element P elements in this terminal inverted repeat (TIR) transposon superfamily have 31 bp perfect TIR and upon insertion duplicate an 8 bp sequence. It contains transposase that may lack the DDE domain. https://www.cell.com/fulltext/0092-8674(83)90133-2
SO:0000208 DNA/PiggyBac DTB The terminal inverted repeat (TIR) transposon superfamily piggyBac, which is primarily found in animals, favours insertion adjacent to TTAA. https://www.nature.com/articles/nrg2165
SO:0000208 DNA/PIF-Harbinger DTH, PIF, Harbinger, Tourist Terminal inverted repeat transposon superfamily PIF-Harbinger creates 3-bp target site duplication that are mainly “TAA” or “TTA”. The autonomous PIF-Harbinger elements are relatively small in size, usually a few kb in length. Tourist is the non-autonomous element in this superfamily usually shorter than 600 bp. The terminal sequences for PIF/Harbinger/Tourist elements are “GGG/CCC…GGC/GCC” or “GA/GGCA…TGCC/TC”. https://pubmed.ncbi.nlm.nih.gov/26709091/
SO:0000208 DNA/CACTA DTC, En, Spm, dSpm, CACTC, En-Spm, EnSpm, CMC-EnSpm This terminal inverted repeat (TIR) transposon superfamily is named CACTA because their terminal sequences are “CACTA/G…C/TAGTG”. CACTA elements generate 3-bp target site duplication (TSD) upon insertion. CACTA elements do not have a significant preference for genic region insertions. https://pubmed.ncbi.nlm.nih.gov/26709091/
SO:0000189 DIRS   Dictyostelium intermediate repeat sequence (DIRS) is a subclass of non-LTR retrotransposons. These YR-encoding elements consist of central gag, pol and tyrosine recombinase (YR) open reading frames (ORFs) flanked with terminal repeat. The pol ORF includes a reverse transcriptase (RT), a RNase H (RH) and, in case of DIRS, a domain similar to bacterial and phage DNA N-6-adenine-methyltransferase (MT). Compared to the retroviral pol (LTR retrotransposons, non-LTR retrotransposons and Penelope elements), both aspartic protease and DDE integrase are absent from YR retrotransposons. DIRS retrotransposons have inverted terminal repeats (ITRs). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3783388/
DIRS DIRS/DIRS RYD DIRS is a superfamily in Dictyostelium intermediate repeat sequence (DIRS) non-LTR retrotransposons carrying Tyrosine recombinase (YR) retrotransposon protein domains: RT, RH, YR, and MT. RT is a reverse transcriptase. RH is RNAse H. MT is DNA N-6-adenine-methyltransferase. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3783388/
DIRS DIRS/Ngaro RYN Ngaro is a superfamily in Dictyostelium intermediate repeat sequence (DIRS) non-LTR retrotransposons with protein domains: RT, RH, YR. RT is a reverse transcriptase. RH is RNAse H. YR is Tyrosine recombinase. Inverted terminal repeats (ITRs) in Ngaro are arranged in A-pol-B-A-B order where A and B represent ITRs. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3783388/
DIRS DIRS/VIPER RYV VIPER is a superfamily in Dictyostelium intermediate repeat sequence (DIRS) non-LTR retrotransposons with protein domains: RT, RH, YR. RT is a reverse transcriptase. RH is RNAse H. YR is Tyrosine recombinase. Inverted terminal repeats (ITRs) in VIPER are arranged in A-pol-B-A-B order where A and B represent ITRs. VIPER is only found in kinetoplastida genomes. https://www.sciencedirect.com/science/article/pii/S0166685105002987
SO:0000189 Penelope RPP Penelope is a subclass of non-LTR retrotransposons. Penelope contains structural features of TR, RT, EN. TR, terminal repeats which can be in tandem or inverse orientation in different Penelope copies. RT is a reverse transcriptase. EN, endonuclease. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3681739/

Best, Shujun

davidwsant commented 4 years ago

Dear Shujun,

Thank you for taking the time to create detailed definitions to go along with each of these terms. I am going through them to double-check everything, but at first glance it looks like they are good and will all be added. Thanks.

Dave

oushujun commented 4 years ago

Dear Dave,

Thank you for taking the time to curate these terms and adding them to the system. Looking forward to adding them to my study.

Shujun

On Fri, Jun 19, 2020 at 4:44 PM David Sant notifications@github.com wrote:

Dear Shujun,

Thank you for taking the time to create detailed definitions to go along with each of these terms. I am going through them to double-check everything, but at first glance it looks like they are good and will all be added. Thanks.

Dave

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/488#issuecomment-646871094, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4ND4CUTCOC4DDDWFLL3RXPL2RANCNFSM4NMSP7RA .

davidwsant commented 4 years ago

Hi Shujun,

Thank you for writing up definitions and adding links to publications. Over the last week or so I have been through all of this information very thoroughly. I am not, however, an expert in the field so I suggest that you review my changes thoroughly as well.

Here is a summary of the changes that I have made:

Parent term that should be used Suggested name Synonym Definition
SO:0000186 (LTR_retrotransposon) LTR/unknown RLX LTR retrotransposon with uncertain classifications. These retrotransposons may contain coding elements including: GAG, AP, INT, RT, RH. GAG is a structural protein for virus-like particles. AP is aspartic proteinase. INT is a DDE integrase. RT is a reverse transcriptase. RH is RNAse H.
SO:0000194 (LINE_element) LINE/unknown RIX Long interspersed element with uncertain classifications.
SO:0000206 (SINE_element) SINE/unknown RSX Short interspersed element with uncertain classifications.

Here are my new suggested names and definitions. If you have any suggested changes, please specify each change in your response so that it will be easy for me to update my notes. Thanks.

Parent labels Synonyms def Ref  
SO:0000186 (LTR_retrotransposon) Copia_LTR_retrotransposon Copia LTR retrotransposon LTR retrotransposons in the Copia superfamily contain elements coding for specific proteins in this order: GAG, AP, INT, RT, RH. GAG is a structural protein for virus-like particles. AP is aspartic proteinase. INT is a DDE integrase. RT is a reverse transcriptase. RH is RNAse H. https://www.nature.com/articles/nrg2165 PMID: 17984973
SO:0000186 (LTR_retrotransposon) Gypsy_LTR_retrotransposon Gypsy LTR retrotransposon LTR retrotransposons in the Gypsy superfamily contain elements coding for specific proteins in this order: GAG, AP, RT, RH, INT. GAG is a structural protein for virus-like particles. AP is aspartic proteinase. INT is a DDE integrase. RT is a reverse transcriptase. RH is RNAse H. https://www.nature.com/articles/nrg2165 PMID: 17984973
SO:0000186 (LTR_retrotransposon) Bel_Pao_LTR_retrotransposon Bel-Pao LTR retrotransposon LTR retrotransposons in the Bel-Pao superfamily are similar to LTRs in the Gypsy and Retrovirus superfamilies. Mainly described in metazoan genomes, this superfamily contain elements coding for specific proteins in this order: GAG, AP, RT, RH and INT. GAG is a structural protein for virus-like particles. AP is aspartic proteinase. INT is a DDE integrase. RT is a reverse transcriptase. RH is RNAse H. https://www.nature.com/articles/nrg2165 PMID: 17984973
SO:0000186 (LTR_retrotransposon) Retrovirus_LTR_retrotransposon Retrovirus LTR retrotransposon LTR retrotransposons in the retrovirus superfamily are similar to LTR retrotransposons in the Gypsy and Bel-Pao superfamilies. Mainly described in vertebrate animals, this superfamily contain elements coding for specific proteins in this order: GAG, AP, RT, RH, INT, and ENV. GAG is a structural protein for virus-like particles. AP is aspartic proteinase. INT is a DDE integrase. RT is a reverse transcriptase. RH is RNAse H. ENV is envelop protein. https://www.nature.com/articles/nrg2165 PMID: 17984973
SO:0000186 (LTR_retrotransposon) Endogenous_Retrovirus_LTR_retrotransposon Endogenous Retrovirus LTR retrotransposon, ERV LTR retrotransposon Endogenous retrovirus (ERV) retrotransposons are abundant in the genomes of jawed vertebrates. Human ERVs (HERVs) are classified based on their homologies to animal retroviruses. Class I families are similar in sequence to mammalian Gammaretroviruses (type C) and Epsilonretroviruses (Type E). Class II families show homology to mammalian Betaretroviruses (Type B) and Deltaretroviruses(Type D). F-Class III families are similar to foamy viruses. https://www.nature.com/articles/nrg2165 PMID: 17984973
SO:0000194 (LINE_element) R2_retrotransposon R2 retrotransposon R2 retrotransposons are LINE elements (SO:0000194) that insert site-specifically into the host organism's 28S ribosomal RNA (rRNA) genes. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3256348/ PMID: 21734471
SO:0000194 (LINE_element) RTE_element RTE retrotransposon RTE retrotransposons are LINE elements (SO:0000194) that contain a domain with homology to the apurinic-apyrimidic (AP) endonucleases in addition to the previously identified reverse transcriptase domain. https://pubmed.ncbi.nlm.nih.gov/9729877/ PMID: 9729877
SO:0000194 (LINE_element) LINE_Jockey_element LINE Jockey element Jockey retrotransposons are LINE elements (SO:0000194) found only in arthropods. The full-length element is ~ 5 kb and contains two open reading frames (SO:0000236), ORF1 (568 aa) and ORF2 (916 aa), the second of which encodes an apurinic endonuclease (APE) and a reverse transcriptase (RT). https://mobilednajournal.biomedcentral.com/articles/10.1186/s13100-019-0184-1 PMID: 31709017
SO:0000194 (LINE_element) L1_element LINE-1 element, LINE 1 element, L1 element Long interspersed element-1 (LINE-1) elements are found in the human genome, which contains ORF1 (open reading frame1, including CC, coiled coil; RRM, RNA recognition motif; CTD, carboxyl-terminal domain) and ORF2 (including EN, endonuclease; RT, reverse transcriptase; C, cysteine-rich domain). The L1-encoded proteins (ORF1p and ORF2p) can mobilize nonautonomous retrotransposons, other noncoding RNAs, and messenger RNAs. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4124830/ PMID: 21801021
SO:0000194 (LINE_element) LINE_I_element LINE I element Elements of the LINE I superfamily are similar to the Jockey and L1 superfamily. They contains two ORFs, the.second of which includes  Apurinic endonuclease (APE) and  reverse transcriptase (RT). The I superfamily encodes an RH (RNase H) domain downstream of the RT domain. https://www.sciencedirect.com/topics/biochemistry-genetics-and-molecular-biology/long-interspersed-nuclear-element ***
SO:0000206 (SINE_element) tRNA_SINE_element tRNA SINE element Short interspersed elements that originated from tRNAs. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3242629/ PMID: 21673742
SO:0000206 (SINE_element) 7SL_SINE_element 7SL element, 7SL SINE element Short interspersed elements that originated from 7SL RNAs. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3242629/ PMID: 21673742
SO:0000206 (SINE_element) 5S_SINE_element 5S element, 5S SINE element Short interspersed elements that originated from 5S rRNAs. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3242629/ PMID: 21673742
SO:0000182 (DNA_transposon) Crypton_transposon Crypton transposon Crypton is a superfamily of DNA transposons that use tyrosine recombinase (YR) to cut and rejoin the recombining DNA molecules. https://mobilednajournal.biomedcentral.com/articles/10.1186/1759-8753-2-12 PMID: 22011512
SO:0000208 (terminal_inverted_repeat_element) Tc1_transposon Tc1 transposon, mariner transposon, mariner_transposon Elements of the Tc1 Ttrminal inverted repeat transposon superfamily (also called mariner transposons) are named after the Transponon of C. elegans number 1 transposasse. Their activity creates a 2-bp (TA) target-site duplication (TSD). Stowaway is the non-autonomous element in this superfamily usually shorter than 600 bp. https://link.springer.com/chapter/10.1007%2F978-3-642-79795-8_6 PMID: 8556864, PMID: 17984973
SO:0000208 (terminal_inverted_repeat_element) hAT_transposon hAT transposon The hAT terminal inverted repeat transposon superfamily elements were first found in maize (the Ac/Ds elements). Members of the hAT superfamily have TSDs of 8 bp, relatively short TIRs of 5–27 bp and overall lengths of less than 4 kb. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1461711/ PMID: 11454746
SO:0000208 (terminal_inverted_repeat_element) Mutator_transposon Mutator transposon Members of the Mutator family of terminal inverted repeat (TIR) transposon are usually long but are also highly divergent - sharing only terminal G…C nucleotides — or are absent. The length of the TSD (7-11 bp, usually 9 bp) remains probably the most useful criterion for identification. https://www.nature.com/articles/nrg2165 PMID: 17984973
SO:0000208 (terminal_inverted_repeat_element) Merlin_transposon Merlin transposon Terminal inverted repeat transposon superfamily Merlin elements create 8-9 bp target-site duplications (TSD). https://www.nature.com/articles/nrg2165 PMID: 17984973
SO:0000208 (terminal_inverted_repeat_element) Transib_transposon Transib, transib transposon Terminal inverted repeat (TIR) transposons of the superfamily Transib contain the DDE motif, which is related to the RAG1 protein involved in V(D)J recombination. https://www.nature.com/articles/nrg2165 PMID: 17984973
SO:0000208 (terminal_inverted_repeat_element) P_transposable_element P element, P-element, P transposable element P elements in this terminal inverted repeat (TIR) transposon superfamily have 31 bp perfect TIR and upon insertion duplicate an 8 bp sequence. It contains transposase that may lack the DDE domain. https://www.cell.com/fulltext/0092-8674(83)90133-2 PMID: 6309410
SO:0000208 (terminal_inverted_repeat_element) piggyBac_element PiggyBac transposable element Primarily found in animals, the terminal inverted repeat (TIR) transposon superfamily piggyBac elements  favour insertion adjacent to TTAA. https://www.nature.com/articles/nrg2165 PMID: 17984973
SO:0000208 (terminal_inverted_repeat_element) PIF_transposable_element PIF transposon element, Harbinger transposon element, Tourist transposon element, DTH transposon element Terminal inverted repeat transposons in the PIF/Harbinger/tourist superfamily create 3-bp target site duplication that are mainly “TAA” or “TTA”. The autonomous PIF-Harbinger elements are relatively small in size, usually a few kb in length. Non-autonomous elements in this superfamily usually shorter than 600 bp are referrred to as Tourist elements. The terminal sequences for PIF/Harbinger/Tourist elements are “GGG/CCC…GGC/GCC” or “GA/GGCA…TGCC/TC”. https://pubmed.ncbi.nlm.nih.gov/26709091/ PMID: 26709091
SO:0000208 (terminal_inverted_repeat_element) CACTA_transposable_element CACTA transposon element, DTC transposon element, En transposon element, Spm transposon element, dSPM transposon element This terminal inverted repeat of the CACTA family generate 3-bp target site duplication (TSD) upon insertion. CACTA elements do not have a significant preference for genic region insertions. This terminal inverted repeat (TIR) transposon superfamily is named CACTA because their terminal sequences are “CACTA/G…C/TAGTG”. https://pubmed.ncbi.nlm.nih.gov/26709091/, https://www.sciencedirect.com/science/article/pii/S1874939915002692?via%3Dihub PMID: 26709091
SO:0000189 (non_LTR_retrotransposon) YR_retrotransposon YR retrotransposon, tyrosine kinase retrotransposon Tyrosine Kinase (YR) retrotransposons are a subclass of non-LTR retrotransposons. These YR-encoding elements consist of central gag, pol and tyrosine recombinase (YR) open reading frames (ORFs) flanked with terminal repeat. The pol ORF includes a reverse transcriptase (RT), a RNase H (RH) and, in case of DIRS, a domain similar to bacterial and phage DNA N-6-adenine-methyltransferase (MT). Compared to the retroviral pol (LTR retrotransposons, non-LTR retrotransposons and Penelope elements), both aspartic protease and DDE integrase are absent from YR retrotransposons. retrotransposons have inverted terminal repeats (ITRs). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3783388/ PMID: 24086727
YR_retrotransposon DIRS_retrotransposon DIRS retrotransposon Dictyostelium intermediate repeat sequence (DIRS) retrotransposons are members of the YR_retrotransposon (SO:add ID) superfamily with the following protein domains: RT, RH, YR, and MT. RT is a reverse transcriptase. RH is RNAse H. YR is tyrosine recombinase. MT is DNA N-6-adenine-methyltransferase. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3783388/ PMID: 24086727
YR_retrotransposon Ngaro_retrotransposon Ngaro retrotransposon Ngaro retrotransposons are members of the YR_retrotransposon (SO: add ID) superfamily with the following protein domains: RT, RH, YR. RT is a reverse transcriptase. RH is RNAse H. YR is Tyrosine recombinase. Inverted terminal repeats (ITRs) in Ngaro are arranged in A-pol-B-A-B order where A and B represent ITRs. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3783388/ PMID: 24086727
YR_retrotransposon Viper_retrotransposon Viper retrotransposon VIPER retrotransposons are members of the YR_retrotransposon (SO:add ID) superfamily with protein domains: RT, RH, YR. RT is a reverse transcriptase. RH is RNAse H. YR is Tyrosine recombinase. Inverted terminal repeats (ITRs) in VIPER are arranged in A-pol-B-A-B order where A and B represent ITRs. VIPER is only found in kinetoplastida genomes. https://www.sciencedirect.com/science/article/pii/S0166685105002987 PMID: 16297462
SO:0000189 (non_LTR_retrotransposon) Penelope_retrotransposon Penelope retrotransposon Penelope is a subclass of non_LTR_retrotransposons (SO:0000189). Penelope retrotransposons contains structural features of TR, RT, EN, TR, terminal repeats which can be in tandem or inverse orientation in different Penelope copies. RT is reverse transcriptase. EN is endonuclease. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3681739/ PMID: 23914310

I hope that this covers everything. This is getting close, but I would just like you to review the changes I suggested to ensure I did not make mistakes.

Thank you,

Dave Sant

oushujun commented 4 years ago

Dear Dave,

Thank you for curating these terms. I like most of them which are well-formatted and consistent. Here is the summary of changes I made:

This is the revised table. Thanks again for working on this issue. Please let me know if you want to change anything.

Parent labels Synonyms def Ref  
SO:0000186 (LTR_retrotransposon) Copia_LTR_retrotransposon RLC, Ty1, Copia LTR retrotransposon LTR retrotransposons in the Copia superfamily contain elements coding for specific proteins in this order: GAG, AP, INT, RT, RH. GAG is a structural protein for virus-like particles. AP is aspartic proteinase. INT is a DDE integrase. RT is a reverse transcriptase. RH is RNAse H. https://www.nature.com/articles/nrg2165 PMID: 17984973
SO:0000186 (LTR_retrotransposon) Gypsy_LTR_retrotransposon RLG, Ty3, Gypsy LTR retrotransposon LTR retrotransposons in the Gypsy superfamily contain elements coding for specific proteins in this order: GAG, AP, RT, RH, INT. GAG is a structural protein for virus-like particles. AP is aspartic proteinase. INT is a DDE integrase. RT is a reverse transcriptase. RH is RNAse H. https://www.nature.com/articles/nrg2165 PMID: 17984973
SO:0000186 (LTR_retrotransposon) Bel_Pao_LTR_retrotransposon RLB, Bel-Pao LTR retrotransposon LTR retrotransposons in the Bel-Pao superfamily are similar to LTRs in the Gypsy and Retrovirus superfamilies. Mainly described in metazoan genomes, this superfamily contain elements coding for specific proteins in this order: GAG, AP, RT, RH and INT. GAG is a structural protein for virus-like particles. AP is aspartic proteinase. INT is a DDE integrase. RT is a reverse transcriptase. RH is RNAse H. https://www.nature.com/articles/nrg2165 PMID: 17984973
SO:0000186 (LTR_retrotransposon) Retrovirus_LTR_retrotransposon RLR, retrovirus LTR retrotransposon LTR retrotransposons in the retrovirus superfamily are similar to LTR retrotransposons in the Gypsy and Bel-Pao superfamilies. Mainly described in vertebrate animals, this superfamily contain elements coding for specific proteins in this order: GAG, AP, RT, RH, INT, and ENV. GAG is a structural protein for virus-like particles. AP is aspartic proteinase. INT is a DDE integrase. RT is a reverse transcriptase. RH is RNAse H. ENV is envelop protein. https://www.nature.com/articles/nrg2165 PMID: 17984973
SO:0000186 (LTR_retrotransposon) Endogenous_Retrovirus_LTR_retrotransposon RLE, HERV, Endogenous Retrovirus LTR retrotransposon Endogenous retrovirus (ERV) retrotransposons are abundant in the genomes of jawed vertebrates. Human ERVs (HERVs) are classified based on their homologies to animal retroviruses. Class I families are similar in sequence to mammalian Gammaretroviruses (type C) and Epsilonretroviruses (Type E). Class II families show homology to mammalian Betaretroviruses (Type B) and Deltaretroviruses (Type D). F-Class III families are similar to foamy viruses. https://www.nature.com/articles/nrg2165 PMID: 17984973
           
SO:0000194 (LINE_element) R2_LINE_retrotransposon RIR, R2 retrotransposon R2 retrotransposons are LINE elements (SO:0000194) that insert site-specifically into the host organism's 28S ribosomal RNA (rRNA) genes. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3256348/ PMID: 21734471
SO:0000194 (LINE_element) RTE_LINE_retrotransposon RIT, RTE retrotransposon RTE retrotransposons are LINE elements (SO:0000194) that contain a domain with homology to the apurinic-apyrimidic (AP) endonucleases in addition to the previously identified reverse transcriptase domain. https://pubmed.ncbi.nlm.nih.gov/9729877/ PMID: 9729877
SO:0000194 (LINE_element) Jockey_LINE_retrotransposon RIJ, LINE Jockey element Jockey retrotransposons are LINE elements (SO:0000194) found only in arthropods. The full-length element is ~ 5 kb and contains two open reading frames (SO:0000236), ORF1 (568 aa) and ORF2 (916 aa), the second of which encodes an apurinic endonuclease (APE) and a reverse transcriptase (RT). https://mobilednajournal.biomedcentral.com/articles/10.1186/s13100-019-0184-1 PMID: 31709017
SO:0000194 (LINE_element) L1_LINE_retrotransposon RIL, LINE-1 element, LINE 1 element, L1 element Long interspersed element-1 (LINE-1) elements are found in the human genome, which contains ORF1 (open reading frame1, including CC, coiled coil; RRM, RNA recognition motif; CTD, carboxyl-terminal domain) and ORF2 (including EN, endonuclease; RT, reverse transcriptase; C, cysteine-rich domain). The L1-encoded proteins (ORF1p and ORF2p) can mobilize nonautonomous retrotransposons, other noncoding RNAs, and messenger RNAs. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4124830/ PMID: 21801021
SO:0000194 (LINE_element) I_LINE_retrotransposon RII, LINE I element Elements of the LINE I superfamily are similar to the Jockey and L1 superfamily. They contains two ORFs, the.second of which includes  Apurinic endonuclease (APE) and  reverse transcriptase (RT). The I superfamily encodes an RH (RNase H) domain downstream of the RT domain. https://www.sciencedirect.com/topics/biochemistry-genetics-and-molecular-biology/long-interspersed-nuclear-element ***
           
SO:0000206 (SINE_element) tRNA_SINE_retrotransposon RST, tRNA SINE element Short interspersed elements that originated from tRNAs. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3242629/ PMID: 21673742
SO:0000206 (SINE_element) 7SL_SINE_retrotransposon RSL, 7SL SINE element Short interspersed elements that originated from 7SL RNAs. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3242629/ PMID: 21673742
SO:0000206 (SINE_element) 5S_SINE_retrotransposon RSS, 5S SINE element Short interspersed elements that originated from 5S rRNAs. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3242629/ PMID: 21673742
           
SO:0000182 (DNA_transposon) Crypton_YR_transposon DYC, Crypton transposon Crypton is a superfamily of DNA transposons that use tyrosine recombinase (YR) to cut and rejoin the recombining DNA molecules. https://mobilednajournal.biomedcentral.com/articles/10.1186/1759-8753-2-12 PMID: 22011512
SO:0000208 (terminal_inverted_repeat_element) Tc1_Mariner_TIR_transposon DTT, Tc1, Mariner, Stowaway, TcMar-Stowaway transposon Elements of the Tc1-Mariner terminal inverted repeat transposon superfamily (also called mariner transposons) are named after the Transponon of C. elegans number 1 transposasse. Their activity creates a 2-bp (TA) target-site duplication (TSD). Stowaway is the non-autonomous element in this superfamily usually shorter than 600 bp. https://link.springer.com/chapter/10.1007%2F978-3-642-79795-8_6 PMID: 8556864, PMID: 17984973
SO:0000208 (terminal_inverted_repeat_element) hAT_TIR_transposon DTA, Ac, Ds, Ac/Ds, hAT-Ac, hAT transposon The hAT terminal inverted repeat transposon superfamily elements were first found in maize (the Ac/Ds elements). Members of the hAT superfamily have TSDs of 8 bp, relatively short TIRs of 5–27 bp and overall lengths of less than 4 kb. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1461711/ PMID: 11454746
SO:0000208 (terminal_inverted_repeat_element) Mutator_TIR_transposon DTM, MuDR, Mu, MULE, MLE, Mutator transposon Members of the Mutator family of terminal inverted repeat (TIR) transposon are usually long but are also highly divergent - sharing only terminal G…C nucleotides — or are absent. The length of the TSD (7-11 bp, usually 9 bp) remains probably the most useful criterion for identification. https://www.nature.com/articles/nrg2165 PMID: 17984973
SO:0000208 (terminal_inverted_repeat_element) Merlin_TIR_transposon DTE, Merlin transposon Terminal inverted repeat transposon superfamily Merlin elements create 8-9 bp target-site duplications (TSD). https://www.nature.com/articles/nrg2165 PMID: 17984973
SO:0000208 (terminal_inverted_repeat_element) Transib_TIR_transposon DTR, transib transposon Terminal inverted repeat (TIR) transposons of the superfamily Transib contain the DDE motif, which is related to the RAG1 protein involved in V(D)J recombination. https://www.nature.com/articles/nrg2165 PMID: 17984973
SO:0000208 (terminal_inverted_repeat_element) P_TIR_transposon DTP, P-element, P element, P-element, P transposable element P elements in this terminal inverted repeat (TIR) transposon superfamily have 31 bp perfect TIR and upon insertion duplicate an 8 bp sequence. It contains transposase that may lack the DDE domain. https://www.cell.com/fulltext/0092-8674(83)90133-2 PMID: 6309410
SO:0000208 (terminal_inverted_repeat_element) piggyBac_TIR_transposon DTB, PiggyBac transposable element Primarily found in animals, the terminal inverted repeat (TIR) transposon superfamily piggyBac elements favour insertion adjacent to TTAA. https://www.nature.com/articles/nrg2165 PMID: 17984973
SO:0000208 (terminal_inverted_repeat_element) PIF_Harbinger_TIR_transposon DTH, PIF, Harbinger, Tourist transposon element Terminal inverted repeat transposons in the PIF/Harbinger/tourist superfamily create 3-bp target site duplication that are mainly “TAA” or “TTA”. The autonomous PIF-Harbinger elements are relatively small in size, usually a few kb in length. Non-autonomous elements in this superfamily usually shorter than 600 bp are referrred to as Tourist elements. The terminal sequences for PIF/Harbinger/Tourist elements are “GGG/CCC…GGC/GCC” or “GA/GGCA…TGCC/TC”. https://pubmed.ncbi.nlm.nih.gov/26709091/ PMID: 26709091
SO:0000208 (terminal_inverted_repeat_element) CACTA_TIR_transposon DTC, En, Spm, dSpm, CACTC, En-Spm, EnSpm, CMC-EnSpm, CACTA transposon element This terminal inverted repeat of the CACTA family generate 3-bp target site duplication (TSD) upon insertion. CACTA elements do not have a significant preference for genic region insertions. This terminal inverted repeat (TIR) transposon superfamily is named CACTA because their terminal sequences are “CACTA/G…C/TAGTG”. https://pubmed.ncbi.nlm.nih.gov/26709091/https://www.sciencedirect.com/science/article/pii/S1874939915002692?via%3Dihub PMID: 26709091
SO:0000189 (non_LTR_retrotransposon) YR_retrotransposon YR retrotransposon, tyrosine kinase retrotransposon Tyrosine Kinase (YR) retrotransposons are a subclass of non-LTR retrotransposons. These YR-encoding elements consist of central gag, pol and tyrosine recombinase (YR) open reading frames (ORFs) flanked with terminal repeat. The pol ORF includes a reverse transcriptase (RT), a RNase H (RH) and, in case of DIRS, a domain similar to bacterial and phage DNA N-6-adenine-methyltransferase (MT). Compared to the retroviral pol (LTR retrotransposons, non-LTR retrotransposons and Penelope elements), both aspartic protease and DDE integrase are absent from YR retrotransposons. YR retrotransposons have inverted terminal repeats (ITRs). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3783388/ PMID: 24086727
YR_retrotransposon DIRS_YR_retrotransposon RYD, DIRS retrotransposon Dictyostelium intermediate repeat sequence (DIRS) retrotransposons are members of the YR_retrotransposon (SO:add ID) superfamily with the following protein domains: RT, RH, YR, and MT. RT is a reverse transcriptase. RH is RNAse H. YR is tyrosine recombinase. MT is DNA N-6-adenine-methyltransferase. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3783388/ PMID: 24086727
YR_retrotransposon Ngaro_YR_retrotransposon RYN, Ngaro retrotransposon Ngaro retrotransposons are members of the YR_retrotransposon (SO: add ID) superfamily with the following protein domains: RT, RH, YR. RT is a reverse transcriptase. RH is RNAse H. YR is Tyrosine recombinase. Inverted terminal repeats (ITRs) in Ngaro are arranged in A-pol-B-A-B order where A and B represent ITRs. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3783388/ PMID: 24086727
YR_retrotransposon Viper_YR_retrotransposon RYV, Viper retrotransposon VIPER retrotransposons are members of the YR_retrotransposon (SO:add ID) superfamily with protein domains: RT, RH, YR. RT is a reverse transcriptase. RH is RNAse H. YR is Tyrosine recombinase. Inverted terminal repeats (ITRs) in VIPER are arranged in A-pol-B-A-B order where A and B represent ITRs. VIPER is only found in kinetoplastida genomes. https://www.sciencedirect.com/science/article/pii/S0166685105002987 PMID: 16297462
SO:0000189 (non_LTR_retrotransposon) Penelope_retrotransposon RPP, Penelope retrotransposon Penelope is a subclass of non_LTR_retrotransposons (SO:0000189). Penelope retrotransposons contains structural features of TR, RT, EN, TR, terminal repeats which can be in tandem or inverse orientation in different Penelope copies. RT is reverse transcriptase. EN is endonuclease. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3681739/ PMID: 23914310

Best, Shujun

davidwsant commented 4 years ago

Hi Shujun,

Thank you for looking through this thoroughly. I am glad that you were able to catch the things that needed to be changed. I noticed that you changed some terms from "element" to "retrotransposon", and I am glad you are more familiar with these than I am so that you know which ones to change. The publications tend to mention the class and then leave the individual superfamilies with just the short name, but I would prefer to include the longer name because it is more specific yet I want to be as consistent with the literature as possible. I think keeping the "I Line" is fine. I will add the link, but no PubMed ID. I think this looks good and I am just about ready to add the terms.

I only have one more thing to ask. As for the three-letter names, if these are well known then they should be included as you have suggested. However, any three-letter acronym has likely been used elsewhere and some of them are likely to match up with other scientific terms. The two-letter names are even more likely to be an acronym for something else. Do you think I could put the type of transposon with the synonym just to make it easier? For instance, it could be "RLC retrotransposon" instead of RLC. Would that be acceptable?

Here is a list of some well-known acronyms that happen to match synonyms included here:

Aside from these ones that have other meanings known among scientists, there are many other meanings that are commonly used in everyday occurrences, especially in shorthand versions of text from younger generations. I think it would be best if we added something to the synonym for all of the two or three letter symbols to avoid confusion. Would that be ok with you?

Below are the term names and the suggested changes I have for the synonyms:

labels Synonyms
Copia_LTR_retrotransposon RLC retrotransposon, Ty1 retrotransposon, Copia LTR retrotransposon
Gypsy_LTR_retrotransposon RLG retrotransposon, Ty3 retrotransposon, Gypsy LTR retrotransposon
Bel_Pao_LTR_retrotransposon RLB retrotransposon, Bel-Pao LTR retrotransposon
Retrovirus_LTR_retrotransposon RLR retrotransposon, retrovirus LTR retrotransposon
Endogenous_Retrovirus_LTR_retrotransposon RLE retrotransposon, HERV, Endogenous Retrovirus LTR retrotransposon
   
R2_LINE_retrotransposon RIR retrotransposon, R2 retrotransposon
RTE_LINE_retrotransposon RIT retrotransposon, RTE retrotransposon
Jockey_LINE_retrotransposon RIJ retrotransposon, LINE Jockey element
L1_LINE_retrotransposon RIL retrotransposon, LINE-1 element, LINE 1 element, L1 element
I_LINE_retrotransposon RII retrotransposon, LINE I element
   
tRNA_SINE_retrotransposon RST retrotransposon, tRNA SINE element
7SL_SINE_retrotransposon RSL retrotransposon, 7SL SINE element
5S_SINE_retrotransposon RSS retrotransposon, 5S SINE element
   
Crypton_YR_transposon DYC transposon, Crypton transposon
Tc1_Mariner_TIR_transposon DTT transposon, Tc1 transposon, Mariner, Stowaway, TcMar-Stowaway transposon
hAT_TIR_transposon DTA transposon, Ac transposon, Ds transposon, Ac/Ds, hAT-Ac, hAT transposon
Mutator_TIR_transposon DTM transposon, MuDR, Mu transposon, MULE, MLE transposon, Mutator transposon
Merlin_TIR_transposon DTE transposon, Merlin transposon
Transib_TIR_transposon DTR transposon, transib transposon
P_TIR_transposon DTP transposon, P-element, P element, P-element, P transposable element
piggyBac_TIR_transposon DTB transposon, PiggyBac transposable element
PIF_Harbinger_TIR_transposon DTH transposon, PIF transposon, Harbinger, Tourist transposon element
CACTA_TIR_transposon DTC transposon, En transposon, Spm transposon, dSpm, CACTC, En-Spm, EnSpm, CMC-EnSpm, CACTA transposon element
   
YR_retrotransposon YR retrotransposon, tyrosine kinase retrotransposon
DIRS_YR_retrotransposon RYD retrotransposon, DIRS retrotransposon
Ngaro_YR_retrotransposon RYN retrotransposon, Ngaro retrotransposon
Viper_YR_retrotransposon RYV retrotransposon, Viper retrotransposon
Penelope_retrotransposon RPP retrotransposon, Penelope retrotransposon

Do these changes look ok?

Thanks,

Dave

oushujun commented 4 years ago

That's appropriate. Thanks for taking the causion of not confusing them with other acronyms.

Best, Shujun

On Thu, Jun 25, 2020 at 1:07 PM David Sant notifications@github.com wrote:

Hi Shujun,

Thank you for looking through this thoroughly. I am glad that you were able to catch the things that needed to be changed. I noticed that you changed some terms from "element" to "retrotransposon", and I am glad you are more familiar with these than I am so that you know which ones to change. The publications tend to mention the class and then leave the individual superfamilies with just the short name, but I would prefer to include the longer name because it is more specific yet I want to be as consistent with the literature as possible. I think keeping the "I Line" is fine. I will add the link, but no PubMed ID. I think this looks good and I am just about ready to add the terms.

I only have one more thing to ask. As for the three-letter names, if these are well known then they should be included as you have suggested. However, any three-letter acronym has likely been used elsewhere and some of them are likely to match up with other scientific terms. The two-letter names are even more likely to be an acronym for something else. Do you think I could put the type of transposon with the synonym just to make it easier? For instance, it could be "RLC retrotransposon" instead of RLC. Would that be acceptable?

Here is a list of some well-known acronyms that happen to match synonyms included here:

  • RLE: Right Lower Extremity
  • RST: Rapid Strep Test
  • DTT: dithiothreitol, a chemical commonly used as a reducing reagent in labs
  • DPT: Diptheria-pertussis-tetanus vaccine
  • Ds: Double-stranded. If you type this one into the SO Broswer it will give you ds-DNA.
  • hAT: histone acetyltransferase
  • Ac: Acetylation, as in histone acetylation. The most commonly known one is H3K37Ac. These have not been added to SO, but it has been discussed.
  • Mu: The greek letter. This is used for many things including 'micro', so I especially don't want this one used alone
  • En: English, as in using the English version of something

Aside from these ones that have other meanings known among scientists, there are many other meanings that are commonly used in everyday occurrences, especially in shorthand versions of text from younger generations. I think it would be best if we added something to the synonym for all of the two or three letter symbols to avoid confusion. Would that be ok with you?

Below are the term names and the suggested changes I have for the synonyms: labels Synonyms Copia_LTR_retrotransposon RLC retrotransposon, Ty1 retrotransposon, Copia LTR retrotransposon Gypsy_LTR_retrotransposon RLG retrotransposon, Ty3 retrotransposon, Gypsy LTR retrotransposon Bel_Pao_LTR_retrotransposon RLB retrotransposon, Bel-Pao LTR retrotransposon Retrovirus_LTR_retrotransposon RLR retrotransposon, retrovirus LTR retrotransposon Endogenous_Retrovirus_LTR_retrotransposon RLE retrotransposon, HERV, Endogenous Retrovirus LTR retrotransposon

R2_LINE_retrotransposon RIR retrotransposon, R2 retrotransposon RTE_LINE_retrotransposon RIT retrotransposon, RTE retrotransposon Jockey_LINE_retrotransposon RIJ retrotransposon, LINE Jockey element L1_LINE_retrotransposon RIL retrotransposon, LINE-1 element, LINE 1 element, L1 element I_LINE_retrotransposon RII retrotransposon, LINE I element

tRNA_SINE_retrotransposon RST retrotransposon, tRNA SINE element 7SL_SINE_retrotransposon RSL retrotransposon, 7SL SINE element 5S_SINE_retrotransposon RSS retrotransposon, 5S SINE element

Crypton_YR_transposon DYC transposon, Crypton transposon Tc1_Mariner_TIR_transposon DTT transposon, Tc1 transposon, Mariner, Stowaway, TcMar-Stowaway transposon hAT_TIR_transposon DTA transposon, Ac transposon, Ds transposon, Ac/Ds, hAT-Ac, hAT transposon Mutator_TIR_transposon DTM transposon, MuDR, Mu transposon, MULE, MLE transposon, Mutator transposon Merlin_TIR_transposon DTE transposon, Merlin transposon Transib_TIR_transposon DTR transposon, transib transposon P_TIR_transposon DTP transposon, P-element, P element, P-element, P transposable element piggyBac_TIR_transposon DTB transposon, PiggyBac transposable element PIF_Harbinger_TIR_transposon DTH transposon, PIF transposon, Harbinger, Tourist transposon element CACTA_TIR_transposon DTC transposon, En transposon, Spm transposon, dSpm, CACTC, En-Spm, EnSpm, CMC-EnSpm, CACTA transposon element

YR_retrotransposon YR retrotransposon, tyrosine kinase retrotransposon DIRS_YR_retrotransposon RYD retrotransposon, DIRS retrotransposon Ngaro_YR_retrotransposon RYN retrotransposon, Ngaro retrotransposon Viper_YR_retrotransposon RYV retrotransposon, Viper retrotransposon Penelope_retrotransposon RPP retrotransposon, Penelope retrotransposon

Do these changes look ok?

Thanks,

Dave

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/488#issuecomment-649736102, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NEJSZGPPOJGAPUTY43RYOG43ANCNFSM4NMSP7RA .

davidwsant commented 4 years ago

Hi Shujun,

After all that discussion and reading through all of those papers, the terms are finally added. The new terms have been assigned IDs SO:0002264-SO:0002290. I should also mention that the term P_element (SO:0001535) was already present, but it has been updated with the info from our discussion. The SO Browser updates once per day, so the changes should be visible on the page within approximately 24 hours. I hope this covers all of the terms that you need. If you find any more, go ahead and create a new issue.

Good luck with your project,

Dave Sant

oushujun commented 4 years ago

Hi Dave,

That's fantastic! Thanks for taking the time and great efforts to add these terms.

Best, Shujun