CCMS-UCSD / GNPS_Workflows

Public Workflows at GNPS
https://gnps.ucsd.edu/
Other
54 stars 44 forks source link

Qemistree updates #365

Closed anupriyatripathi closed 4 years ago

anupriyatripathi commented 4 years ago

The following features need to be pushed into the new GNPS release:

  1. Adding the option to select the instrument the data was run on for FP prediction accuracy
  2. Adding the option to use euclidean or jaccard distances to cluster fingerprints
  3. Running Classyfire on smiles from ms/ms matches or csi fingerID predictions
  4. Figuring out how to render the tree in a user-friendly manner
mwang87 commented 4 years ago

Can you provide the exact commandlines we would be using to run this?

anupriyatripathi commented 4 years ago

I've sent you an email with the commands for this.

anupriyatripathi commented 4 years ago

Test tasks:

Name Description FBMN Qemistree
QE QERT C18 benchmarking QE example Job Job
Western vs. rural skin samples QTOF example Job Job
Mice serum samples QTOF no metadata example Job Job
Fungal gardens QTOF example Job Job
anupriyatripathi commented 4 years ago

@mwang87 The current workflow renders a file that appears to be an output of one of the early steps in qemistree workflow i.e. output from SIRIUS. We should instead be rendering the feature data file produced from the last step of the processing i.e. the output of get-classyfire-taxonomy which is a .qza file that can be exported to a .tsv as follows:

qiime tools export --input-path classyfire_results.qza --output-path classyfire-results

classyfire-results is a folder that would contain a .tsv file called feature_data.tsv. We should render feature_data.tsv

mwang87 commented 4 years ago

I agree they may not be correct, but please check the output files. Are they correct for all the tests you have run? Lets take this one step at a time.

anupriyatripathi commented 4 years ago

@mwang87 The output files look correct; just highlighting the TBDs here. Upon further investigation, it looks a file called formula.qza is being rendered currently.

mwang87 commented 4 years ago

Sounds good. I'll pull out that file. What are the contents of that feature_data.tsv file? Which columns are meaningful to display?

anupriyatripathi commented 4 years ago

The classified feature data has the following columns. I don't think there's an issue with displaying everything. table_number and is not relevant though as we would not be supporting meta-analyses for the time being. And the column smiles is just either ms2_smiles or csi_smiles (when ms2 _smiles in NaN)

id #featureID csi_smiles ms2_smiles ms2_compound ms2_adduct table_number smiles annotation_type kingdom superclass class subclass direct_parent
mwang87 commented 4 years ago

Closing as the latest changes seem to be good. Will go out in release 18