CSi-Studio / 3D-MSNet

A point cloud based deep learning model for untargeted feature detection and quantification in profile LC-HRMS data
Other
27 stars 2 forks source link

How to export the MSI-peak-value-table and Mgf? #3

Open YUANMENG-1 opened 1 month ago

YUANMENG-1 commented 1 month ago

How to output the mz-rt-intensity summary table that combines different samples like traditional tools such as XCMS?

The ① in the tutorial outputs a number of CSVS for each sample folder I used --target_id=-1 to draw as many as possible, but only got one []. It seems that ① and ② are not able to directly output the total number of peak meters containing multiple samples? But I need to get this, and in general the traditional upstream will also output merged.mgf, 3D-MSNet can you do that?

image

①Prepare point clouds:

python workflow/predict/point_cloud_extractor.py --data_dir=PATH_TO_MZML --output_dir=POINT_CLOUD_PATH --window_mz_width=0.8 --window_rt_width=6 --min_intensity=128 --from_mz=0 --to_mz=2000 --from_rt=0 --to_rt=300 --expansion_mz_width=0.1 --expansion_rt_width=1

②Extract features:

python workflow/predict/main_eval.py --data_dir=POINT_CLOUD_PATH --mass_analyzer=orbitrap --mz_resolution=60000 --resolution_mz=400 --rt_fwhm=0.1 --target_id=None

CSi-Ti commented 1 month ago

Thanks for your attention.

  1. For the first problem: 'only got one figure'. I used Ubuntu system with GUI, and used PyCharm to run my code. When target_id is set to -1, the figures will pop up as in my demo video. To see the next figure, you can shut the current one, or press Q on keyboard.

  2. For the aligned peak matrix you required, 3D-MSNet doesn't support feature alignment for now. A complete quantification workflow would compose of feature extraction, isotope / adduct grouping, alignment and identification modules, while 3D-MSNet only focus on the first part "feature extraction". We are trying to implement 3D-MSNet into MetaPro (a CSi-Studio developed metabolomics DDA data analysis and curation platform) in the following year. Users are also welcomed to contribute to the 3D-MSNet project, and connect 3D-MSNet to popular analysis pipelines.

YUANMENG-1 commented 1 month ago

Oh!Thank you so much and another question is that, do you have any suggestions for merging the results of 3D-MSNet into a consolidated peak value table?

3D-MSNet produces a folder for each sample, with over ten thousand CSV files in each folder (named something like peakid_mz_rt.csv). Each CSV file contains hundreds to thousands of rows (appearing to be in the format rt_mz_intensity).

Is it feasible to merge this type of data into a complete peak value table?

Currently, the approach seems to be:

It appears that each CSV file is named after the peak's mz and rt values. The idea is to calculate a peak intensity value from over 800 rows of rt_mz_intensity in each peak CSV file. (However, how can these over 800 rows be consolidated into a single intensity value? Could we possibly select the mz and rt values closest to those in the CSV filename? 🙋‍♂️)

After processing each sample's CSV files as described in step 1, could we then merge them by calculating a certain mz-rt tolerance?

CSi-Ti commented 1 month ago

Ahh, the tens of thousands of CSV files are the separated point clouds generated by 0_pc_extraction.py, which can be considered as patches of the raw data and do not contain the results of peak detection.

The peak detection result can be generated by 1_peak_detection.py, with only ONE file for each sample as shown in Zenodo.

For a quick alignment implementation, you can refer to my another study G-Aligner, which is used for precise alignment based on basic peak information (mz, rt, intensity). In the G-Aligner github repo, I also provided codes to use the alignment functions in popular software tools, such as OpenMS(recommended), MZmine2 and XCMS.

FYI, here are some tips that might help:

  1. Adjust the params of 3D-MSNet carefully. Preview results (as in demo video) before batch analysis. The rt_fwhm parameter is highly sensitive, and directly affects the accuracy.
  2. The G-Aligner method is more accurate, but it's relatively slow in over 40 samples. If you are analyzing a large cohort, try to use other implemented alignment methods like OpenMS. We are developing and submitting a new alignment method to solve this problem.
  3. 3D-MSNet do not consider any MS2 information for now, only produces peak detection results without isotope and adduct removal. Losing the MS2 information will affects subsequent identification. However, linking MS2 spectra to peaks is quite intuitive, cause 3D-MSNet provides the predicted boundary of each detected peak. For each peak, we can search the corresponding MS2 spectrum from raw data by its precursor m/z and peak boundary. Then, the identfication process could be done by GNPS after file format adaptions.

You can also contact me at nico-USTC by Wechat (Chinese version) for instant chat.