glasgowcompbio / PALS

Ranking metabolite (and other omics sets) by their activity levels via SVD
https://pals.glasgowcompbio.org
MIT License
2 stars 1 forks source link

GNPS analysis throwing non-finite values error #47

Open mita0000 opened 3 years ago

mita0000 commented 3 years ago

Certain combinations of case/control groups throw the error below:

RuntimeError: The data contains non-finite values. Traceback: File "/home/jw247a/.local/share/virtualenvs/PALS-fesh4CcU/lib/python3.8/site-packages/streamlit/script_runner.py", line 333, in _run_script exec(code, module.dict) File "/home/jw247a/PALS/pals/run_gui.py", line 165, in main() File "/home/jw247a/PALS/pals/run_gui.py", line 78, in main results = run_analysis(params) File "/home/jw247a/.local/share/virtualenvs/PALS-fesh4CcU/lib/python3.8/site-packages/streamlit/caching.py", line 603, in wrapped_func return get_or_create_cached_value() File "/home/jw247a/.local/share/virtualenvs/PALS-fesh4CcU/lib/python3.8/site-packages/streamlit/caching.py", line 587, in get_or_create_cached_value return_value = func(*args, *kwargs) File "./pals/run_gui_gnps.py", line 78, in run_analysis df = PLAGE_decomposition(ds) File "/home/jw247a/.local/share/virtualenvs/PALS-fesh4CcU/lib/python3.8/site-packages/streamlit/caching.py", line 603, in wrapped_func return get_or_create_cached_value() File "/home/jw247a/.local/share/virtualenvs/PALS-fesh4CcU/lib/python3.8/site-packages/streamlit/caching.py", line 585, in get_or_create_cached_value return_value = func(args, **kwargs) File "./pals/run_gui_gnps.py", line 126, in PLAGE_decomposition df = method.get_results() File "./pals/PLAGE.py", line 71, in get_results plage_df = self.set_up_resample_plage_p_df(activity_df, streamlit_pbar=streamlit_pbar) File "./pals/PLAGE.py", line 125, in set_up_resample_plage_p_df pvalues = self._compare_resamples(tvalues, null_max_tvalues, null_min_tvalues) File "./pals/PLAGE.py", line 334, in _compare_resamples maxparams = genextreme.fit(null_max_tvalues) File "/home/jw247a/.local/share/virtualenvs/PALS-fesh4CcU/lib/python3.8/site-packages/scipy/stats/_distn_infrastructure.py", line 2378, in fit raise RuntimeError("The data contains non-finite values.")

The GNPS link: https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=40e3f11e31ad4f319497f9f4e47c2e47 The metadata file is attached below. It doesn't seem to be a metadata file issue--is it because of the GNPS data?

Thank you! metadata_iBAT.csv

joewandy commented 2 years ago

Thanks for reporting the issue above.

I tested with the GNPS link above and also managed to reproduce the error with Control: iBat_Wildling_Chow Case: iBat_Lab_HFCDD

It seems that this is caused by certain molecular familiar having the same member MS1 intensity values throughout the case and/or control groups. This causes the within-group variance to be zero, so the t-statistic during permutation test becomes undefined (NaN), which breaks the codes. As a fix, I've excluded those MFs from calculation. They'd still appear in the final results but right at the bottom with p-values of 1.0, and can be ignored. Hope that helps. Any other problem, please let me know.

Joe