biocore / q2-qemistree

Hierarchical orderings for mass spectrometry data. Canonically pronounced "chemis-tree".
BSD 2-Clause "Simplified" License
31 stars 16 forks source link

Updating Qemistree so it runs from SIRIUS workspace and generic input files #145

Open lfnothias opened 3 years ago

lfnothias commented 3 years ago

Hi @anupriyatripathi

This is to initiate a discussion to resolve the current issue users are having running Qemistree.

Context:

A sustainable solution would be to modify the Qemistree workflow by externalizing the SIRIUS computation part. The user would provide the SIRIUS workspace as input to run qiime qemistree make-hierarchy. Of course, the user would be instructed to have computed a minimal set of steps SIRIUS/CSIFINGERID and ZODIAC/CANOPUS as optional.

For even larger flexibility in the long run and for offering wider support for other similarity functions, like basic cosine score ,and those that are being developed (like MS2DeepScore, https://www.biorxiv.org/content/10.1101/2021.04.18.440324v1), the best would be to have the possibility to run the hierarchy from the generic input files: They would be:

Actually @ElDeveloper, with my support, wrote a python script for the Earth Microbiome Project that generates a tree/hierarchy from a novel SIRIUS workspace. The script only usesscipy scikit qiime2 libraries. Maybe we should release that very soon to help the users who are struggling ? Is anyone interested in testing that solution ?

anupriyatripathi commented 3 years ago

Hi @lfnothias, thanks for raising these important issues regarding:

a) support for making hierarchies with other similarity/dissimilarity matrices such as cosine scores, Tanimoto scores, etc b) the incompatibility of q2-qemistree with the latest Sirius version It is good to know that @ElDeveloper has a prototype to generate a chemical hierarchy from a new Sirius workspace, which means that this can be done.

Would you or @ElDeveloper or someone else be interested in working on adding some of these functionalities to q2-qemistree? I would be able to support the development process by discussing how to best implement this and providing code reviews.

ElDeveloper commented 3 years ago

Yes, this is absolutely a great idea. I think probably the best way is to create a new directory format (SiriusWorkspacev440 or something like that). Then we can write two transformers, one to extract the fingerprints and one to extract the feature metadata. The commands you would run are something along the lines of:

To get the fingerprints (in a matrix form)

qiime tools import \
--input-path emp-sirius-workspace \
--output-path emp-fingerprints.qza \
--format SiriusWorkspacev440 \
--type FeatureTable[Frequency]

To get the feature metadata (for use with other plugins)

qiime tools import \
--input-path emp-sirius-workspace \
--output-path emp-feature-metadata.qza \
--format SiriusWorkspacev440 \
--type FeatureData[Molecules]

After the user has done this, then a user would need to use the fingerprints to build the tree (we can add a new action).

The biggest change from this is that we would leave running Sirius up to the end users, and we would mostly be handling the tree construction QCing, etc by parsing the Sirius workspaces. I kinda like this idea because when Sirius changes its outputs in the future, then we'll only need to write a new directory format, for example SiriusWorkspacev666. In the artifact outputs would remain the same (a feature table and the corresponding feature metadata).

lfnothias commented 3 years ago

Nice. That seems a very practical way to deal with SIRIUS in the long run ! If we could also support a similarity matrix as input, that would give the maximum flexibility for incorporating other tools/similarity function.

ElDeveloper commented 3 years ago

Great, thanks @lfnothias. Any thoughts @anupriyatripathi?

amcaraballor commented 2 years ago

Hey dear all @lfnothias @ElDeveloper @anupriyatripathi , do you guys have an updated way for this Qemistree workflow?

anupriyatripathi commented 2 years ago

Hi @amcaraballor, we have worked on updating Qemistree with @helenamrusso. Her Github branch has the latest version that is compatible with the latest version of Sirius. It will be merged into the main workflow soon but you can use the branch if needed. @helenamrusso has been using it and helping other users as well.