Closed boopthesnoot closed 1 month ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
I would suggest to integrate the data with the modification.tsv and the amino_acids.yaml: https://github.com/MannLabs/alphabase/blob/737f25c79c34de5182140543bf887ab61d7e53d5/alphabase/constants/const_files/amino_acid.yaml
So we don't have to add a new constants folder and need to have double book keeping of modification names. I think we can add two keys for each entry: sum/composition
and smiles
.
In the modifications.tsv you can just add a smiles
column.
@boopthesnoot Have a look at #200. I created an extra and a decorator for rdkit.
I would suggest to integrate the data with the modification.tsv and the amino_acids.yaml: https://github.com/MannLabs/alphabase/blob/737f25c79c34de5182140543bf887ab61d7e53d5/alphabase/constants/const_files/amino_acid.yaml
So we don't have to add a new constants folder and need to have double book keeping of modification names. I think we can add two keys for each entry:
sum/composition
andsmiles
.In the modifications.tsv you can just add a
smiles
column.
@GeorgWa But the smiles in the modifications.tsv will be a mess, some of them will be AA's with PTMs, some of them will be terminal modifications only, without the AA, and we would still have to store which is which somewhere. By adding a key for each of the AAs in amino_acids.yaml we'll still have double bookkeeping of the atomic composition because we can infer it from SMILES. Ofc it would mean having a rdkit dependency for the whole package x)
@GeorgWa But the smiles in the modifications.tsv will be a mess, some of them will be AA's with PTMs, some of them will be terminal modifications only, without the AA, and we would still have to store which is which somewhere. By adding a key for each of the AAs in amino_acids.yaml we'll still have double bookkeeping of the atomic composition because we can infer it from SMILES. Ofc it would mean having a rdkit dependency for the whole package x)
We could resolve this by looking up the localizer @Any N-Term
. Alternatively we can also introduce a second column location = {'N','C','AA'}
which would use dynamic or fixed smiles depending of the value.
In alphabase the modification names likeDimethyl@K
are the primary keys across all applications. I think this primary key should only be defined once. Furthermore, the master record in modifications.tsv
is updated automatically from unimod if more modifications are added. This way everything will stay in sync.
@GeorgWa But the smiles in the modifications.tsv will be a mess, some of them will be AA's with PTMs, some of them will be terminal modifications only, without the AA, and we would still have to store which is which somewhere. By adding a key for each of the AAs in amino_acids.yaml we'll still have double bookkeeping of the atomic composition because we can infer it from SMILES. Ofc it would mean having a rdkit dependency for the whole package x)
We could resolve this by looking up the localizer
@Any N-Term
. Alternatively we can also introduce a second columnlocation = {'N','C','AA'}
which would use dynamic or fixed smiles depending of the value.In alphabase the modification names like
Dimethyl@K
are the primary keys across all applications. I think this primary key should only be defined once. Furthermore, the master record inmodifications.tsv
is updated automatically from unimod if more modifications are added. This way everything will stay in sync.
Yes, I think we should use only one PTM and AA defination file to avoid ambiguity in the future.
I would suggest to integrate the data with the modification.tsv and the amino_acids.yaml: https://github.com/MannLabs/alphabase/blob/737f25c79c34de5182140543bf887ab61d7e53d5/alphabase/constants/const_files/amino_acid.yaml
So we don't have to add a new constants folder and need to have double book keeping of modification names. I think we can add two keys for each entry:
sum/composition
andsmiles
.In the modifications.tsv you can just add a
smiles
column.
We should use aa.tsv instead of aa.yaml for AAs, similar to modification.tsv
I just catched that the dtype of unimod column in the modification.tsv changed to float. Can we move back?
You can find the description and examples in
docs/nbs/tutorial_smiles.ipynb