cremerlab / hplc-py

A Python utility for the processing and quantification of chromatography data
https://cremerlab.github.io/hplc-py/
GNU General Public License v3.0
33 stars 5 forks source link

Testing on GC-MS data: ValueError: `x0` is infeasible #19

Closed liaochen1988 closed 3 months ago

liaochen1988 commented 4 months ago

Hi Dr. Chure, this is a great tool that I'd like to integrate into my GC-MS data pipeline. Now as you can imagine, the overlapping peaks were handled separately by cutting the overall shape from the minimum in between. To try your approach, I installed hplc-py and tested on the total ion chromatography (TIC) of a GC-MS data my colleague recently generated. However, it gives me first a warning

-------------------------- Hey! Yo! Heads up! ---------------------------------- | This time window (from 8.9085 to 30.114) has 55 candidate peaks. | This is a complex mixture and may take a long time to properly fit depending | on how well resolved the peaks are. Reduce buffer if the peaks in this
| window should be separable by eye. Or maybe just go get something to drink.

followed by an error ValueError: x0 is infeasible.

To help you reproduce, I attached the TIC data gc_ms_tic.csv

See below for the simple code I used import pandas as pd from hplc.quant import Chromatogram data = pd.read_csv("gc_ms_tic.csv") chrom = Chromatogram(data) peaks = chrom.fit_peaks() scores = chrom.assess_fit() chrom.show() peaks

I understand that this might be a complex chromatography but is there anyway we can make it work? Resolving overlapping peaks in GC-MS is a challenge and this will affect the reconstruction of mass spectra and everything in the downstream.

Thank you very much and looking forward to your reply!

gchure commented 4 months ago

Hi @liaochen1988, thanks for giving hplc-py a go. I think we will be able to get this to work with your data, though it may take me a few weeks to dig into it (going on holiday for 10 days or so).

In playing around with your provided chromatogram, I was able to get some fits (albeit not very good). As evident in #15 and #18 , there is an edge case in the code where the initial guesses for the peak fitting violate the prescribed parameter bounds. I think I know what's going on there, but need to dig into it to fix it.

In the mean time, an immediate suggestion I have is that you need to adjust the parameter approx_peak_width in the fit_peaks() call. That argument sets the peak resolution to preserve after subtraction. The default is 5 timepoints, which maybe too small for some of the peaks in your chromatogram.

While I'm away and thinking about it, you should try to either i) manually alter the parameter bounds (see param_bounds here) or ii) manually provide the retention times of major peaks (and overlapping peaks) that you want to deconvolve (see here).

Again, I think we can make this work, and I would love to get some more experience working with GCMS data.

liaochen1988 commented 4 months ago

Thank you Dr. Chure for your quick response. I will try your suggestions and see if I can integrate the current version of your pipeline into my code. I look forward to hearing your updates.

gchure commented 3 months ago

Hi @liaochen1988, hope you are doing well. I wanted to let you know that I pushed a new version of hplc-py to PyPI (v0.2.7) that may help with this, specifically in bounding of parameters.

However, after looking at your chromatogram more closely, I think it's unfortunately too undersampled for useful peak fitting. Many of your peaks are composed of ≈ 10 measurements, which is insufficient for this method. If you are unable to increase the sampling frequency, hplc-py may not be a good fit for your question.

I'm going to close this issue for now, but please feel free to reopen if you need more help!

LiaoLabATDartmouth commented 3 months ago

Absolute thanks for working on my request. I will ask my collaborator if it is possible to increase sampling frequency.