dosorio / Peptides

An R package to calculate indices and theoretical physicochemical properties of peptides and protein sequences.
80 stars 21 forks source link

Follow-up on the expansion of amino acids to include non-naturals. #52

Open francisacquah466 opened 1 year ago

francisacquah466 commented 1 year ago

Hi @dosorio

Thanks for such a wonderful package.

I'm working to generate lot of peptides mostly with non-natural amino acids. I was wondering if there is a possibility of expanding the list of amino acids to include new amino acids and their SMILES. So that for the aaSMILES function peptides with non-naturals to be pass into to generate SMILES for them. I envisage a situation where one letter amino acid name may be problematic. Is there a way to this can be added. Maybe by using the 3-letter amino acid code rather than the 1 letter code.

It would help a lot!

Thanks!

jspaezp commented 1 year ago

Hey there! Sorry for the late reply ... RN the implementation of the smiles generator is fairly simple (https://github.com/jspaezp/Peptides/blob/b0aab3765f99a0c4c79dddfecdd12d3ff71c9a20/R/smilesStrings.R) and I think it could be extended to 3-letter aas, but since the 3 letter abbreviation is not supported in any other part of the package (that I can recall) I would feel very inconsistent ...

Maybe something like this would work for you (I have not tested it but I feel like it would work ...):

three_letter_aaSMILES <- function(seq) {
  aminoacid_smiles <- c(
    "Ala" = "N[C@@]([H])(C)C(=O)O",
    ... # All other amino-acids added here
    "Val" = "N[C@@]([H])(C(C)C)C(=O)O")

  # split_sequences <- strsplit(toupper(seq), "")
  split_sequences <- lapply(seq, function(x) gsub("(.{5})", "\\1 ", x))

  smiles_aa_sequences <- lapply(split_sequences, function(x) aminoacid_smiles[x])

  # This trims the last O in the -OH in the carbonyl in each aminoacid
  concat_aa_smiles <- lapply(
      smiles_aa_sequences,
      function(x) paste(gsub("O$", "", x), collapse = ""))

  concat_aa_smiles <- lapply(concat_aa_smiles, function(x) paste0(x, "O"))
  concat_aa_smiles <- unlist(concat_aa_smiles)

  return(concat_aa_smiles)

}