csho33 / bacteria-ID

Source code and demos for "Rapid identification of pathogenic bacteria using Raman spectroscopy and deep learning"
MIT License
86 stars 36 forks source link

How to get valid input from Raman Spectrum output? #4

Open gaospecial opened 3 months ago

gaospecial commented 3 months ago

Dear @csho33,

Thanks for providing such an excellent learning materials. In your data, the input for ResNet model is a 1000-dim vector. Could you explain how the input is obtained from Raman raw data.

Raman measurements

We measured Raman spectra across monolayer regions of the dried samples (Fig. 1a) using the mapping mode of a Horiba LabRAM HR Evolution Raman microscope. 633 nm illumination at 13.17 mW was used with a 300 l/mm grating to generate spectra with 1.2 cm−1 dispersion to maximize signal strength while minimizing background signal from autofluorescence. Wavenumber calibration was performed using a silicon sample. The ×100 0.9 NA objective lens (Olympus MPLAN) generates a diffraction-limited spot size, 1  µm in diameter. A 45 × 45 discrete spot map is taken with 3  µm spacing between spots to avoid overlap between spectra. The spectra are individually background corrected using a polynomial fit of order 5 using the subbackmod Matlab function available in the Biodata toolbox (see Supplementary Fig. 1 for examples of raw and corrected spectra). The majority of spectra are measured on true monolayers and arise from ~1 cell due to the diffraction-limited laser spot size, which is roughly the size of a bacteria cell. However, a small number of spectra may be taken over aggregates or multilayer regions. We exclude the spectra that are most likely to be non-monolayer measurements by ranking the spectra by signal intensity and discarding the 25 spectra with highest intensity, which includes all spectra with intensities greater than two standard deviations from the mean. We measured both monolayers and single cells, and found that monolayer measurements have SNRs of 2.5 ± 0.7, similar to single-cell measurements (2.4 ± 0.6), while allowing for the semi-automated generation of a large training dataset. The spectral range between 381.98 and 1792.4 cm−1 was used, and spectra were individually normalized to run from a minimum intensity of 0 and maximum intensity of 1 within this spectral range. SNR values are calculated by dividing the total intensity range by the intensity range over a 20-pixel wide window in a region where there is no Raman signal.

As per your method in article, the process includes 1) smooth, 2) normalization, and 3) use spectrum signal between 381.98 and 1792.4  cm-1. The two steps are very clear, but how the spectrum ranges transformed into a 1000-dim vector? Do you use part of the signals of selected wavenumber, or just interpolate between the range specified?

Please give kind response, thank you very much.


Chun-Hui