VerisimilitudeX / DNAnalyzer

Revolutionizing DNA analysis and making it accessible to all through innovative ML-powered analysis and interpretive tools.
Other
132 stars 56 forks source link

New analysis ideas #368

Closed i2kmt closed 1 year ago

i2kmt commented 1 year ago

After discussion, we decided to note critical known amino acids/sequences/codons related to a specific protein function.

Let's see how you would like me to note this information so you can handle it best afterward. We could use this issue to discuss also any questions or ideas.


Here I'm going to note everything that has already been proposed.

  • Proteins coded for Serine: The presence of several Serine codons in the DNA sequence suggests that the genome may contain genes that encode for proteins with enzymatic or regulatory functions. Serine is a common amino acid found in many proteins, including enzymes involved in the metabolism of lipids, carbohydrates, and nucleic acids. Its presence in the DNA sequence may indicate that the organism has a complex metabolism that requires many enzymes for different metabolic pathways.
  • Cysteins can create disulfide bonds (SS-bond). These bonds help a protein survive high temperatures or different ph levels as they stabilise it's structure. They are often located in proteins of the extracellular space where ph is not so stable or in organisms who live in high-temperature regions.
  • Tyrosine is really important for enzyme activity
  • Lysine is targeted with ubiquitin (a protein) with enzymes. Basically enzymes called ubiquitin enzymes bind ubiquiting molecules on other proteins to influence their function. If the lys codon is placed in the 48th place, it has different meaning than if placed in the 63th place. 48th place targets the protein for degradation, while 63th place activates the protein. Proteins that are high ubiquitinilation targets are involved in cell life cycle pathways. So this parameter is good and helps a lot in my opinion because you need to detect an exact codon in an exact position too.
  • GC-content (genome): The GC-content of the genome is close to the average for most organisms, which suggests that the genome is evolving, and its GC-content may change over time due to mutations, genetic drift, or natural selection. However, a more detailed analysis of the genome would be necessary to determine if the GC-content is evenly distributed across the genome or if certain regions have a higher or lower GC-content.
  • Nucleotide count: The nucleotide count provides information on the relative abundance of each nucleotide in the genome. The fact that each nucleotide has a similar abundance suggests that the genome is relatively stable and is not subject to strong selective pressures that favor one nucleotide over another. However, a more detailed analysis of the genome would be necessary to determine if certain regions of the genome have a biased nucleotide composition.
  • High coverage regions: The high coverage regions indicate that these DNA sequences are highly conserved across the genome, and they may play important roles in regulating gene expression or in coding for functional proteins. A more detailed analysis of these regions could reveal their function and provide insights into the biology of the organism.
  • Codons in reading frame 1: The analysis of codon usage provides information on the frequency of each codon in the genome. The fact that there is no strong bias towards any particular codon suggests that the genome is evolving neutrally, without strong selective pressures that favor certain codons over others. However, a more detailed analysis of the codon usage in different genes could reveal patterns of codon usage that are specific to different functional classes of proteins.
i2kmt commented 1 year ago

@VerisimilitudeX @LimesKey

VerisimilitudeX commented 1 year ago

@VerisimilitudeX @LimesKey

@i2kmt https://github.com/VerisimilitudeX/DNAnalyzer/discussions/377#discussioncomment-5962630

github-actions[bot] commented 1 year ago

Stale issue message