cremerlab / hplc-py

A Python utility for the processing and quantification of chromatography data
https://cremerlab.github.io/hplc-py/
GNU General Public License v3.0
19 stars 4 forks source link

Evaluation of chromatogram with only two peaks with high intensity differences difficult to match #18

Open sebastian-hogeweg opened 1 month ago

sebastian-hogeweg commented 1 month ago

Applying the presented workflow leads to insufficient results regarding this data set (chromatogram1.csv). I already tried to manipulate the data (chromatogram1_mod.csv) so that all values are positive; however, the peaks and the corresponding areas look strange to me. Modifying specific parameters, such as the window in the baseline, only limited improved the result. Consequently, it would be great to get some help selecting the parameters to improve the result. I am looking forward to any help.

Example code: ` chromatogram = load_chromatogram('chromatogram1_mod.csv', cols=['time', 'signal']) chrom = Chromatogram(chromatogram) chrom.show() plt.savefig("chromatogram.svg", bbox_inches="tight", transparent = False) plt.close()

chrom = Chromatogram(chromatogram) chrom.correct_baseline() chrom.show() plt.savefig("chromatogram_baseline_correction.svg", bbox_inches="tight", transparent = False) plt.close()

peaks = chrom.fit_peaks(correct_baseline=False, prominence=0.01) chrom.show() plt.savefig("chromatogram_peaks.svg", bbox_inches="tight", transparent = False) plt.show() `

chromatogram chromatogram_peaks chromatogram_peaks_window100

gchure commented 1 month ago

Hi @sebastian-hogeweg. Thanks for the issue. This is something that's known for very large-valued time dimensions (see #15). I think I know what the issue is, but it will take me some time to rework how the windowing and inference operates.

In the mean time, you can work on manually adjusting the fitting parameter bounds (see param_bounds on deconvolve_peaks). I suspect broadening the location and amplitude bounds will help.

Additionally, you will need to adjust approx_peak_width in the call to fit_peaks for the background subtraction. The default value there is 2, where in your case it should be something more like 500 since your time dimension is large.