ElucidataInc / ElMaven

LC-MS data processing tool for large-scale metabolomics experiments.
https://resources.elucidata.io/elmaven/
GNU General Public License v2.0
87 stars 52 forks source link

compound name suffixes #1335

Closed chubukov closed 3 years ago

chubukov commented 4 years ago

At some point, compound names from user-supplied compound databases started getting a suffix like "(1)" appended to them. This causes issues with some of our workflows that rely on matching by compound name.

What causes this? Is there a way to disable it?

saifulbkhan commented 4 years ago

@chubukov This happens when El-MAVEN finds two or more compounds that have the same combination of name, id and database-name but have some difference in other attributes (e.g. category, collision-energy, etc.).

We had to do this to preserve the exact pairing between a peak-group and a compound when restoring from emDB sessions. It also allows users to distinguish between two peak-groups of compounds with the same name and ID when they are using spectral libraries (for MS/MS) which have many different fragmentation spectra for the same compound (at different CEs).

As of now, there is no way to disable it.

chubukov commented 4 years ago

@saifulbkhan thanks. Is the original name stored somewhere, or can it be re-generated in a well-defined way? Could we do this during export?

Of course we could do s/\s*\(\d+\)$// at the tail end, but perhaps there's something more systematic.

saifulbkhan commented 3 years ago

@chubukov Yes, the original name is still stored in the Compound object (in a property called originalName). It is also saved in the emDB, at least for the recent versions. So if one uses them as their base for export, they can extract the compound name exactly as it was in the originally supplied database.

Do you want me to make this change in the custom export script?

chubukov commented 3 years ago

@saifulbkhan ok, that sounds good. Assuming it's really just pulling out the originalName property, I should be able to make the changes.

Thanks

saifulbkhan commented 3 years ago

@chubukov Alright. One note - if you are going to extract this value from emDB, you will need the original_name (and not originalName) column from compounds table.