Open lfnothias opened 3 years ago
Hi @lfnothias, thanks for raising these important issues regarding:
a) support for making hierarchies with other similarity/dissimilarity matrices such as cosine scores, Tanimoto scores, etc b) the incompatibility of q2-qemistree with the latest Sirius version It is good to know that @ElDeveloper has a prototype to generate a chemical hierarchy from a new Sirius workspace, which means that this can be done.
Would you or @ElDeveloper or someone else be interested in working on adding some of these functionalities to q2-qemistree? I would be able to support the development process by discussing how to best implement this and providing code reviews.
Yes, this is absolutely a great idea. I think probably the best way is to create a new directory format (SiriusWorkspacev440
or something like that). Then we can write two transformers, one to extract the fingerprints and one to extract the feature metadata. The commands you would run are something along the lines of:
qiime tools import \
--input-path emp-sirius-workspace \
--output-path emp-fingerprints.qza \
--format SiriusWorkspacev440 \
--type FeatureTable[Frequency]
qiime tools import \
--input-path emp-sirius-workspace \
--output-path emp-feature-metadata.qza \
--format SiriusWorkspacev440 \
--type FeatureData[Molecules]
After the user has done this, then a user would need to use the fingerprints to build the tree (we can add a new action).
The biggest change from this is that we would leave running Sirius up to the end users, and we would mostly be handling the tree construction QCing, etc by parsing the Sirius workspaces. I kinda like this idea because when Sirius changes its outputs in the future, then we'll only need to write a new directory format, for example SiriusWorkspacev666
. In the artifact outputs would remain the same (a feature table and the corresponding feature metadata).
Nice. That seems a very practical way to deal with SIRIUS in the long run ! If we could also support a similarity matrix as input, that would give the maximum flexibility for incorporating other tools/similarity function.
Great, thanks @lfnothias. Any thoughts @anupriyatripathi?
Hey dear all @lfnothias @ElDeveloper @anupriyatripathi , do you guys have an updated way for this Qemistree workflow?
Hi @amcaraballor, we have worked on updating Qemistree with @helenamrusso. Her Github branch has the latest version that is compatible with the latest version of Sirius. It will be merged into the main workflow soon but you can use the branch if needed. @helenamrusso has been using it and helping other users as well.
Hi @anupriyatripathi
This is to initiate a discussion to resolve the current issue users are having running Qemistree.
Context:
A sustainable solution would be to modify the Qemistree workflow by externalizing the SIRIUS computation part. The user would provide the SIRIUS workspace as input to run
qiime qemistree make-hierarchy
. Of course, the user would be instructed to have computed a minimal set of steps SIRIUS/CSIFINGERID and ZODIAC/CANOPUS as optional.For even larger flexibility in the long run and for offering wider support for other similarity functions, like basic cosine score ,and those that are being developed (like MS2DeepScore, https://www.biorxiv.org/content/10.1101/2021.04.18.440324v1), the best would be to have the possibility to
run the hierarchy from the generic input files:
They would be:Actually @ElDeveloper, with my support, wrote a python script for the Earth Microbiome Project that generates a tree/hierarchy from a novel SIRIUS workspace. The script only uses
scipy scikit qiime2
libraries. Maybe we should release that very soon to help the users who are struggling ? Is anyone interested in testing that solution ?