Error evaluating HELM string

azech-hqs commented 1 month ago

Hi there!

First of all, great work, and thanks for this neat tool!

I recently encountered some issues using pyPept with HELM strings. I installed the pyPept package as recommended in a fresh conda environment with Python 3.9. When trying run_pyPept with the first HELM string example from the README via

run_pyPept --helm "PEPTIDE1{P.E.P.T.I.D.E}$$$$V2.0"

I get the following error

/Users/alex/software/micromamba/envs/pypept/lib/python3.9/site-packages/pyPept/converter.py:217: UserWarning: problem with HELM string - not enough sections: PEPTIDE1{P.E.P.T.I.D.E}1731717317V2.0
  warnings.warn(f'problem with HELM string - not enough sections: {helm}')
Traceback (most recent call last):
  File "/Users/alex/software/micromamba/envs/pypept/lib/python3.9/site-packages/pyPept/converter.py", line 214, in eval_helm
    version = self.__split_helm(helm)
  File "/Users/alex/software/micromamba/envs/pypept/lib/python3.9/site-packages/pyPept/converter.py", line 112, in __split_helm
    list_of_connections = helm_parts[1]
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/alex/software/micromamba/envs/pypept/bin/run_pyPept", line 8, in <module>
    sys.exit(main())
  File "/Users/alex/software/micromamba/envs/pypept/lib/python3.9/site-packages/pyPept/interfaces/run_pyPept.py", line 141, in main
    b = Converter(helm=args.helm)
  File "/Users/alex/software/micromamba/envs/pypept/lib/python3.9/site-packages/pyPept/converter.py", line 54, in __init__
    self.eval_helm(helm=helm)
  File "/Users/alex/software/micromamba/envs/pypept/lib/python3.9/site-packages/pyPept/converter.py", line 218, in eval_helm
    warnings.warn(f'need 5, have {len(self.__split_helm(helm))}')
  File "/Users/alex/software/micromamba/envs/pypept/lib/python3.9/site-packages/pyPept/converter.py", line 112, in __split_helm
    list_of_connections = helm_parts[1]
IndexError: list index out of range

Unfortunately, none of the HELM strings listed in the README.md seem to work.

In another attempt, I created a HELM string (oxytocin) myself using the HELM web editor. However, the resulting HELM string, PEPTIDE1{C.Y.I.Q.N.C.P.L.G.[am]}$PEPTIDE1,PEPTIDE1,1:R3-6:R3$$$V2.0, leads to the same "not enough sections" error as shown above.

As a workaround, I could successfully run pyPept with the oxytocin example by converting the HELM string to BILN format. This was achieved with the BILN-converter script.

The problem seems to occur in the __split_helm function in converter.py. For the case of oxytocin, the helm_parts list has only one item after the splitting loop: ['PEPTIDE1{C.Y.I.Q.N.C.P.L.G.[am]},PEPTIDE1,1:R3-6:R317317.0']. Hence, helm_parts[1] throws an IndexError. I can try to work on a bug fix in the coming weeks, but I would first have to familiarize myself with the HELM format and the possible variations.

jbbrownlsi commented 3 weeks ago

Thanks for your reports of these problems with the direct handling of HELM. I can reproduce your problems, and you are correct that there is an assumption in the HELM handling code that there will be more than one "part".

We will handle it as much as time permits or the community can more rapidly provide a solution, though I cannot guarantee a timeline by which we would post an update to the repository. Your approach to use the HELM-to-BILN conversion is a good interim solution, and I would expect that it is not much more than making a small patch based on the differences between the BILN-converter code and what is in pyPept currently.

jbbrownlsi commented 3 weeks ago

About Oxytocin, if I do this:

>>> from pyPept import converter
>>> c = converter.Converter()
>>> c.eval_biln("C(1,3)-Y-I-Q-N-C(1,3)-P-L-G")
>>> c.get_helm()
'PEPTIDE1{C.Y.I.Q.N.C.P.L.G}$PEPTIDE1,PEPTIDE1,1:R3-6:R3$$$V2.0'

and then use the result separately execute the standard pipeline, everything is fine:

`--> ./run_pyPept --helm 'PEPTIDE1{C.Y.I.Q.N.C.P.L.G}$PEPTIDE1,PEPTIDE1,1:R3-6:R3$$$V2.0' 
11:44:51   INFO:1. Processing the HELM->BILN sequence C(1,3)-Y-I-Q-N-C(1,3)-P-L-G
11:44:51   INFO:2. Creating the RDKit object
The SMILES of the peptide is: CC[C@H](C)[C@@H]1NC(=O)[C@H](Cc2ccc(O)cc2)NC(=O)[C@@H](N)CSSC[C@@H](C(=O)N2CCC[C@H]2C(=O)N[C@@H](CC(C)C)C(=O)NCC(=O)O)NC(=O)[C@H](CC(N)=O)NC(=O)[C@H](CCC(N)=O)NC1=O
11:44:51   INFO:3. Predicting the peptide conformer
Predicted Secondary Structure: --GGG-S-- for main chain: CYIQNCPLG
11:45:13   INFO:File generated: peptide.png.
11:45:13   INFO:File generated: peptide.pdb.

The only differences in our strings is that the capping group is not included explicitly in mine.

Boehringer-Ingelheim / pyPept

Error evaluating HELM string #8