TeselaGen / openVectorEditor

DEPRECATED - Teselagen's Open Source Vector/Plasmid Editor Component
https://teselagen.github.io/tg-oss/ove/#/Editor
MIT License
199 stars 71 forks source link

Calculation of molecular mass of proteins #916

Closed jobabi closed 1 year ago

jobabi commented 1 year ago

Dear OVE Team,

I noticed that the molecular mass of a translated protein is differing from the calculation of other software tools. (in my case a DNA sequence of 769 aa was calculated in OVE to have 97,5 kDa, whereas other software did calculate 83,7) Maybe to mass of one or more aminoacids is wrong ?

I would appreciate if this can be corrected

Thank you & best wishes

Jochen

@tnrich

tnrich commented 1 year ago

@jobabi thanks for pointing this out. It is possible that the mass of one of the amino acids is off. Here is the list of the current AA masses we use:

export const protein_weights = {
  A: 89.0932, 
  C: 121.1582, 
  D: 133.1027, 
  E: 147.1293, 
  F: 165.1891, 
  G: 75.0666, 
  H: 155.1546, 
  I: 131.1729, 
  K: 146.1876, 
  L: 131.1729, 
  M: 149.2113, 
  N: 132.1179, 
  O: 255.3134, 
  P: 115.1305, 
  Q: 146.1445, 
  R: 174.201, 
  S: 105.0926, 
  T: 119.1192, 
  U: 168.0532, 
  V: 117.1463, 
  W: 204.2252, 
  Y: 181.1885 
};

Do any of those numbers look off to you?

jobabi commented 1 year ago

Dear Thomas,

I was puzzled that all your masses are all too high according to these references: https://proteomicsresource.washington.edu/protocols06/masses.php https://education.expasy.org/student_projects/isotopident/htdocs/aa-list.html

After few minutes I found the reason:

Alanine as a single aminoacid has the formular C3H7NO2 and the mass 89.09 But in a protein chain due to the formation of the peptide bond the contribution of Alanine is only C3H5ON (minus H2O). -> all your masses are 18. ... to high

The perfect mass would calculated when adding the masses below and adding one H2O 18.0152 to one peptide chain for the N and C Terminus

I put the masses into the same format that you send. Hopefully you may copy paste that into the code ?

A: 71.0788, C: 103.1388, D: 115.0886, E: 129.1155, F: 147.1766, G: 57.0519, H: 137.1411, I: 113.1594, K: 128.1741, L: 113.1594, M: 131.1926, N: 114.1038, O: 237.29816, P: 97.1167, Q: 128.1307, R: 156.1875, S: 87.0782, T: 101.1051, U: 150.3079, V: 99.1326, W: 186.2132, Y: 163.1760

With this tool one can check the output of OVE https://proteomicsresource.washington.edu/cgi-bin/fragment.cgi

Thank you for correcting this problem

Jochen

jobabi commented 1 year ago

alternatively you subtract (n-1)*18.0153 (n being the number of aminoacids in the protein)

tnrich commented 1 year ago

@jobabi

Thanks for figuring this out. I am in the process of updating the AA weights and am noticing what seems to be a bug on the UWPR website. When I do the fragment calculator including a U, it seems to add a weight that is too large:

image

Besides that it appears that my improved algorithm matches the output from UWPR

I've published a new version of @teselagen/sequence-utils and @teselagen/ove

Please note that all future tickets should be opened here in the new Teselagen OSS mono-repo: https://github.com/TeselaGen/tg-oss

This repo is now deprecated and no future changes will be made to it.