RECETOX / recetox-xMSannotator

This is a custom adaptation of the original version of xMSannotator. It is a complete rewrite of the original functionality, following the same program structure.
GNU General Public License v3.0
5 stars 5 forks source link

main: Add functionality to obtain compound name from its chemical formula #76

Closed maximskorik closed 2 years ago

maximskorik commented 2 years ago

The October version of the advanced annotation outputs the annotation table without compound names. This should be fixed to make the tool more user-friendly. The names are not used anywhere in the annotation pipeline, so they can be added to the output at the end of the annotation.

- [ ] Add functionality to obtain IUPAC compound names from empirical chemical formula

hechth commented 2 years ago

@maximskorik This should actually be only loading the Name from the compound table if it is there - getting the name from the formula is not possible due to isomers, so I'd rely on reading it from the database.

@martenson What is your opinion on optionally reading a column if it is there and ignoring it if it isn't? I feel like it adds extra programming effort to respect which things are optional and which aren't while the benefit is not immediately visible to me, except for having a smaller and minimalistic data frame inside the program.

martenson commented 2 years ago

@hechth Seems like a harmless approach to me, unless it makes the file significantly larger. I think the decision is not technical but rather based on the scientific/user benefits.

hechth commented 2 years ago

Having the name in there right from the start is beneficial for debugging and having them in the output is crucial, also since our IDs aren't canonical (coming from HMDB, KEGG or PubChem etc.) and relying on that is also dangerous. Relying on the name is likely also not ideal, but for now better than the ID IMHO.

hechth commented 2 years ago

Work in progress is available here

maximskorik commented 2 years ago

@maximskorik This should actually be only loading the Name from the compound table if it is there - getting the name from the formula is not possible due to isomers, so I'd rely on reading it from the database.

That's a good point. I forgot to consider isomers.