Qemistree updates - Githubissues

CCMS-UCSD / GNPS_Workflows

Public Workflows at GNPS

https://gnps.ucsd.edu/

Other

54 stars 44 forks source link

Qemistree updates #365

Closed anupriyatripathi closed 4 years ago

anupriyatripathi commented 4 years ago

The following features need to be pushed into the new GNPS release:

Adding the option to select the instrument the data was run on for FP prediction accuracy
Adding the option to use euclidean or jaccard distances to cluster fingerprints
Running Classyfire on smiles from ms/ms matches or csi fingerID predictions
Figuring out how to render the tree in a user-friendly manner

mwang87 commented 4 years ago

Can you provide the exact commandlines we would be using to run this?

anupriyatripathi commented 4 years ago

I've sent you an email with the commands for this.

anupriyatripathi commented 4 years ago

Test tasks:

Name	Description	FBMN	Qemistree
QE QERT C18 benchmarking	QE example	Job	Job
Western vs. rural skin samples	QTOF example	Job	Job
Mice serum samples	QTOF no metadata example	Job	Job
Fungal gardens	QTOF example	Job	Job

anupriyatripathi commented 4 years ago

@mwang87 The current workflow renders a file that appears to be an output of one of the early steps in qemistree workflow i.e. output from SIRIUS. We should instead be rendering the feature data file produced from the last step of the processing i.e. the output of get-classyfire-taxonomy which is a .qza file that can be exported to a .tsv as follows:

qiime tools export --input-path classyfire_results.qza --output-path classyfire-results

classyfire-results is a folder that would contain a .tsv file called feature_data.tsv. We should render feature_data.tsv

mwang87 commented 4 years ago

I agree they may not be correct, but please check the output files. Are they correct for all the tests you have run? Lets take this one step at a time.

anupriyatripathi commented 4 years ago

@mwang87 The output files look correct; just highlighting the TBDs here. Upon further investigation, it looks a file called formula.qza is being rendered currently.

mwang87 commented 4 years ago

Sounds good. I'll pull out that file. What are the contents of that feature_data.tsv file? Which columns are meaningful to display?

anupriyatripathi commented 4 years ago

The classified feature data has the following columns. I don't think there's an issue with displaying everything. table_number and is not relevant though as we would not be supporting meta-analyses for the time being. And the column smiles is just either ms2_smiles or csi_smiles (when ms2 _smiles in NaN)

id	#featureID	csi_smiles	ms2_smiles	ms2_compound	ms2_adduct	table_number	smiles	annotation_type	kingdom	superclass	class	subclass	direct_parent

mwang87 commented 4 years ago

Closing as the latest changes seem to be good. Will go out in release 18