Incomplete hunt for lipids

SysMedOs / lipidhunter

LipidHunter is capable to perform bottom up identification of lipids from LC-MS/MS and shotgun lipidomics data by resembling a workflow of manual spectra annotation. LipidHunter generates interactive HTML output with its unique six-panel-image, which provides an easy way to review, store, and share the identification results.

7 stars 6 forks source link

Incomplete hunt for lipids #34

Closed ghost closed 4 years ago

ghost commented 4 years ago

I have been trying to use LipidHunter but have not been able to get any results the 5 times I have tried using the software. Below are the parameters that I have used in my hunt for lipids:

vendor = thermo experiment_mode = LC-MS lipid_class = PC charge_mode = [M+HCOO]- fawhitelist_path_str = C:\Program Files (x86)\LipidHunter\ConfigurationFiles\1-FA_Whitelist.xlsx score_cfg = C:\Program Files (x86)\LipidHunter\ConfigurationFiles\2-Score_weight_PL.xlsx mzml_path_str = C:\Users\kunda\Documents\Computational-Lipidomics\RawFiles\Experiment2\PC.mzML img_output_folder_str = C:\Users\kunda\Documents\Computational-Lipidomics\RawFiles\Experiment2\LipidHunterOutput xlsx_output_path_str = C:\Users\kunda\Documents\Computational-Lipidomics\RawFiles\Experiment2\LipidHunterOutput\LipidHunterPCOutput.xlsx rt_start = 0.0 rt_end = 10.0 mz_start = 500.0 mz_end = 1000.0 dda_top = 6 pr_window = 0.75 ms_th = 1000 ms_ppm = 19 ms2_th = 10 ms2_ppm = 49 ms2_infopeak_threshold = 0.001 rank_score_filter = 40.0 score_filter = 40.0 isotope_score_filter = 80.0 lipid_specific_cfg = C:\Program Files (x86)\LipidHunter\ConfigurationFiles\3-Specific_ions.xlsx core_number = 3 max_ram = 5 img_type = png img_dpi = 300 hunter_folder = C:\Program Files (x86)\LipidHunter hunter_start_time = 2020-02-11_14-24-09 rank_score = True tag_all_sn = True fast_isotope = False ms_max = 0

I have tried different iterations of these parameters. I have all the dependencies installed and have have managed to use them all separately without any problems (e.g pymzml etc). Any ideas on how I can run the program to completion and produce an output xlsx file with identified lipid classes ?

ZhixuNi commented 4 years ago

Hi! Thank you for report this issue to us.

The general problems is the obo version issue from the mzML file. You probably will see an 'xxx.obo file is missing' error when you run the LipidHunter_debug.exe

Please download the obo.zip patch for LipidHunter2.

Download obo.zip
- https://github.com/SysMedOs/lipidhunter/releases/download/LipidHunter2_RC/obo.zip
For source code version users:
- Please upzip the obo.zip, copy and replace all .obo files in the obo subfolder under /site-packages/pymzml/obo/
For windows users:
- Please upzip the obo.zip, copy and replace all .obo files in the obo subfolder under LipidHunter2 software folder.

This issue is currently fixed in our source code version. If you can run the source code version, please change to the latest Master branch committed on Jan 22, 2020.

Please give us an feedbJan 22, 2020ack if the issue is fixed, thanks!

ZhixuNi commented 4 years ago

Please note that the previous LipidHunter release is using pymzml 0.7.8 and PySide The current Master branch committed on Jan 22, 2020 is using pymzml 2.4.0+ and PySide2

For more details about this obo issue, see our pull request to pymzml: Solution for obo file related errors #134 https://github.com/pymzml/pymzML/pull/134

Please use python 3.7 for the latest LipidHunter and see if our sample dataset is running or not. You can download the sample dataset from: https://github.com/SysMedOs/lipidhunter/releases/download/LipidHunter2_RC/TestData.zip

ghost commented 4 years ago

I am using Python 3.7.4 and am using the latest branch of LipidHunter and have pymzml version 2.4.6 installed. I also copied and replaced the obo files in pymzml site packages with the obo files you supplied . I ran LipidHunter again this time with 4 cores and 16GB dedicated RAM and let it run for 7 hours on my file and it still did not produce any result files. I have used pymzml before as part of pyqms and havent had any problems there. I have also used OpenMS mzml input architecture on the same file and havent faced any problems. At some point in the past, I also forked and made minor changes to the SpetraReader.py file that comes with LipidHunter and it runs with no issues, but still takes quite a long time.

ZhixuNi commented 4 years ago

That's strange to me. How long you need to run the sample dataset in the link above? Can you kindly provide the mzML file so I can have a look, thanks! And please tell me which instrument you are using and how you convert raw file to mzML. Please provide the conversion parameters if you can. I will try to solve this issue as soon as possible.

ghost commented 4 years ago

I just ran the test files and the program is now running to completion. I ran the G_Pos_Thermo_Orbi.mzML file and hunted for Traclyglycerols [M+H]+. The program could not identify any lipids but managed to run and finished in 101.546s. May you please tell me the settings that you use when converting Thermo.RAW files to mzML files using presumably MSConvert. I will then try the same settings and then run my file again and then inform you on the result.

ZhixuNi commented 4 years ago

We are using now proteowizard 3.0.20027 and above. Please have a look of following screenshot. mzML_convert_params

The sample dataset should be used for TG with [M+NH4]+. You can have a closer look of the parameters for this file in our user guide: https://github.com/SysMedOs/lipidhunter/releases/download/LipidHunter2_RC/LipidHunter_UserGuide.pdf

For mzML files from Thermo files, you should start from 1 min and use MS1 threshold 5000 and MS2 Threshold 100 as default. You can try to set it to 10000 and 1000 to make it faster. It would be great if you can send us some screenshot from the terminal so that I can see where you spent most of the time. Generally LipidHunter read mzML file at the speed of 2 min RT / 1 min processing time. We never experienced any identification took longer than 45 min for PLs and 90 min for TGs. I hope you can get identification faster after convert files again.

ghost commented 4 years ago

Here is a link to a file that I am using : https://syncandshare.lrz.de/getlink/fiBUsRK8CVwVmZNZG8tjrxr4/PC.mzML

The above file was downloaded from Metabolites as a Thermo.RAW and converted it using the parameters your showed above.The file was enriched for phosphatidyl choline and phosphatidlyethanolamine. I am currently comparing software and have managed to obtain results ( identified lipids ) from a pipelines using LipidFinder, Lipyd and ALEX123. I selected Phosphatidylcholine [M+HCOO]- as a target lipid class and the program hasn't finished running after 2 hours.

ZhixuNi commented 4 years ago

Hi! I've looked your mzML. There is something very strange in the mzML file. There are very less peaks in MS2 spectra, e.g. for m/z 804.5 there are less than 4 peaks in the mzML file you share with me. Please have a look of this screenshot I got from the proteowizard: KSachi_mmML_PC

There peaks are low in numbers and intensity. The ppm error of FA [M-H]- fragments are quite large.	FA	Theo. m/z	Obs. m/z	ppm
FA16:0 [M-H]-	255.2324	225.20	-126
FA18:1 [M-H]-	281.2481	281.07	-633

LipidHunter is not yet optimized for this kind of MS2, since we usually identify lipids from high resolusion LC-MS (MS2 ppm < 100)

Here is an example of PC [M+HCOO]- identified by LipidHunter from some other dataset. 804 5755_PC 34-1

Please send us the raw file if it is possible. If this is the conversion issue, then we can find the working version of Proteowizard for you data. If it is the spectra quality issue, I will try to see if we can tune LipidHunter to work under this ppm range.

ZhixuNi commented 4 years ago

It looks like that you selected something wrong in the Threshold settings in the conversion. Since the Data points column is always 10 in the mzML you just shared. Please chose the Absolute intensity in the interface, as shown in the screenshot below. mzML_convert_params Please check if all parameters are exactly the same as this screenshot above. Your raw file should give an mzML file more than 100MB at least.

ghost commented 4 years ago

I have not done any experimental analysis myself and am not attached to any lab, therefore all the files I am using were downloaded from MetaboLights database. I only downloaded files from the repository that used ESI - nanoLC Thermo Orbitrap Fusion in their analysis pipeline. I retried the conversion using the parameters you mentioned and an LC-MS only specific lipidomics Thermo Raw file and LipidHunter did not run to completion. Below are links to both .RAW and .mzML files of both LC-MS/MS and LC-MS files for you to query.

https://syncandshare.lrz.de/getlink/fiKPTSupcsEA2cWFkRfQQER3/LCMS-OF-Neg.mzML

https://syncandshare.lrz.de/getlink/fi7ZvdT9wjPqk2oa51njBKzT/LCMS-OF-Neg.raw

https://syncandshare.lrz.de/getlink/fiBUsRK8CVwVmZNZG8tjrxr4/PC.mzML

https://syncandshare.lrz.de/getlink/fi4UGqcJoCPVxqCu5iXJygK3/PC.raw

This is the output thats coming out from the console.

>>> Hunter started ... Please wait ...

Parameters used are as following

[parameters] vendor = thermo experiment_mode = LC-MS lipid_class = PC charge_mode = [M+HCOO]- fawhitelist_path_str = C:\Program Files (x86)\LipidHunter\ConfigurationFiles\1-FA_Whitelist.xlsx score_cfg = C:\Program Files (x86)\LipidHunter\ConfigurationFiles\2-Score_weight_PL.xlsx mzml_path_str = C:\Users\kunda\Documents\Computational-Lipidomics\RawFiles\LCMS-OF-Neg.mzML img_output_folder_str = C:\Users\kunda\Documents\Computational-Lipidomics\RawFiles\LCMSOFLipidHunterOutput xlsx_output_path_str = C:\Users\kunda\Documents\Computational-Lipidomics\RawFiles\LCMSOFLipidHunterOutput\LCMSOF.xlsx rt_start = 1.0 rt_end = 25.0 mz_start = 500.0 mz_end = 1000.0 dda_top = 6 pr_window = 0.75 ms_th = 5000 ms_ppm = 20 ms2_th = 100 ms2_ppm = 50 ms2_infopeak_threshold = 0.001 rank_score_filter = 40.0 score_filter = 40.0 isotope_score_filter = 80.0 lipid_specific_cfg = C:\Program Files (x86)\LipidHunter\ConfigurationFiles\3-Specific_ions.xlsx core_number = 3 max_ram = 5 img_type = png img_dpi = 300 hunter_folder = C:\Program Files (x86)\LipidHunter hunter_start_time = 2020-02-19_10-53-33 rank_score = True tag_all_sn = True fast_isotope = False ms_max = 0

ZhixuNi commented 4 years ago

Hi,

I just downloaded the raw file, converted to mzML by my self, and run LipidHunter. It took around 15min for me from converting file to obtain the results like below. LH_reults_KSachi

The Main reason that you did not get result is that the conversion to mzML was not correct. This is the screenshot of the MSconvert when I convert this file: mzML_convert_params_KSachi Please check all fields marked in the screenshot The file size after conversion should above 8 MB / min for thermo file (this file is 270 MB in the screenshot).

If you managed to convert the mzML correctly, You can use the SeeMS tool from proteowizard to have a look. It should be similar to the screenshot below: mzML_SeeMS_KSachi If the mzML file is fine, I think you will have no problem to run LipidHunter.

This file you got have MS2 acquired in LIT, so that you have to use higher MS2 ppm, e.g. 900. LipidHunter can still work with this MS2 resolution in MS2. See the settings I used: LH_cfg_KSachi There are some mass shift on MS1 level when you check some typical PC lipids, thus I set MS1 ppm to 100.

I would also recommend you to check your MS2 ppm range you used in your previous data analysis. The correct range of MS1 and MS2 ppm settings can give you better result.

The full settings is:

[parameters]
vendor = thermo
experiment_mode = LC-MS
lipid_class = PC
charge_mode = [M+HCOO]-
fawhitelist_path_str = /home/ni/sysmedos/lipidhunter/ConfigurationFiles/1-FA_Whitelist.xlsx
score_cfg = /home/ni/sysmedos/lipidhunter/ConfigurationFiles/2-Score_weight_PL.xlsx
mzml_path_str = /home/ni/Documents/KSachi/PC.mzML
img_output_folder_str = /home/ni/Documents/KSachi/Results/PC
xlsx_output_path_str = /home/ni/Documents/KSachi/Results/PC_test.xlsx
rt_start = 3.0
rt_end = 25.0
mz_start = 600.0
mz_end = 1000.0
dda_top = 6
pr_window = 0.85
ms_th = 1000
ms_ppm = 100
ms2_th = 10
ms2_ppm = 900
ms2_infopeak_threshold = 0.001
rank_score_filter = 40.0
score_filter = 40.0
isotope_score_filter = 80.0
lipid_specific_cfg = /home/ni/sysmedos/lipidhunter/ConfigurationFiles/3-Specific_ions.xlsx
core_number = 3
max_ram = 5
img_type = png
img_dpi = 300
hunter_folder = /home/ni/sysmedos/lipidhunter
hunter_start_time = 2020-02-19_14-04-52
rank_score = True
tag_all_sn = True
fast_isotope = False
ms_max = 0

Please find the complete out put in this zip package: PC_test_KSachi.zip

Based on this preliminary results, you can optimize the parameters and run again. e.g. set following parameters to get faster run and better results quality.

rt_start = 5.0
rt_end = 15.0
mz_start = 700.0
mz_end = 900.0
dda_top = 6
pr_window = 0.85
ms_th = 1000
ms_ppm = 80
ms2_th = 10
ms2_ppm = 900
rank_score_filter = 50.0
score_filter = 50.0

You can also change ConfigurationFiles/1-FA_Whitelist.xlsx to add more Fatty Acids for phospholipids. Hope this time you can get LipidHunter running.

ZhixuNi commented 4 years ago

Hi! I had a look of the mzML files you converted, they are fine. The PC.mzML gives exact the same results as I posted above. However, there is something not correct for the file LCMS-OF-Neg.raw and LCMS-OF-Neg.mzML. The LCMS-OF-Neg.raw is acquired in positive mode, see screenshot below showing a typical positive mode spectra with m/z 184 : LC_Pos_raw_KSachi

and LCMS-OF-Neg.mzML also says it is a positive mode spectra: LC_Pos_mzML_KSachi

Due to the precursor list in this file, there are NO TG with adduct [M+NH4]+ selected for MS2. Currently LipidHunter identify phospholipids in negative mode only. Thus, I recommend you skip this file for LipidHunter and check the identification results manually from other software if the TG or phospholipids identified is correct in polarity and fragmentation pattern is correct. We always recommend to manually review at least 5 to 10 lipid manually from the software reports, this will give you better idea of the identification quality and give you more solid results.

Wish you all the best for your analysis.