Regarding the issue of retaining coordinate values of data

barahona-research-group / RamanSPy

RamanSPy: An open-source Python package for integrative Raman spectroscopy data analysis

https://ramanspy.readthedocs.io

BSD 3-Clause "New" or "Revised" License

81 stars 17 forks source link

Regarding the issue of retaining coordinate values of data #8

Closed Zengcug closed 11 months ago

Zengcug commented 11 months ago

Hi! First of all, thank you for your contribution! Ramanspy is a wonderful package and easy to use even for a code rookie like me. I have a question about the pre-processing stage. Will the order of the data points be altered during this process? I'm currently utilizing Ramanspy for handling my 2-D Raman data (mapping data) and employing the Ramancontainer to load the data from a CSV file (attached below). I hope to retain the coordiates so that I can apply Voigt fitting with lmfit after pre-processing and then import the data to QGIS to see the FWHM value distribution. test.csv By the way, I've observed that some peaks appear to be eliminated during the processing. For example, my original data set had 167 peaks, but after pre-processing, the number of peaks has decreased to 166. Is this expected, or could there be a mistake in my code? Here's the code I'm currently using: pipe = rpy.preprocessing.Pipeline([ rpy.preprocessing.despike.WhitakerHayes(), rpy.preprocessing.denoise.Gaussian(), rpy.preprocessing.baseline.ASLS(), rpy.preprocessing.normalise.MaxIntensity() ]) processed_data = pipe.apply(ramancontainer) newaxis = processed_data.spectral_axis

Thank you again for your contribution!

dgeorgiev21 commented 11 months ago

Thank you for reaching out! We are glad to hear you find RamanSPy useful!

Data order should be maintained during preprocessing! Please let us know if it is not as this would mean there is a bug we need to fix.

Regarding the number of peaks before and after preprocessing, this can indeed change as you apply different preprocessing methods, such as denoising (i.e. removing noise), or despiking (i.e. removing peaks corresponding to cosmic spikes).

If you find RamanSPy useful, please consider starring the project on GitHub! Your support means a lot!

Zengcug commented 11 months ago

I reviewed core.py, and I'm confident that the data order will be maintained. By the way, could you please, if possible, clarify the meaning of X and Y in the context of the SpectralImage class (as quoted below)? In ramanspy.SpectralImage(spectral_data, spectral_axis), it is mentioned: "The [SpectralImage] class defines a 2D spectroscopic image. Dimensions must be in the order of (x, y, B)."

For instance, if my dataset comprises 40 x values on the horizontal axis and 30 y values on the vertical axis, which means that the data contains 30 columns for y and 40 rows for x. So I need to load my spectral_data using data.reshape(y=30, x=40, len(axis)) instead of data.reshape(x=40, y=30, len(axis)).

I appreciate your prompt response!

dgeorgiev21 commented 11 months ago

Thank you for your question! :)

In the context of RamanSPy's SpectralImage and Spectral Volume classes, the notation x, y, z follow the axis order of common Python container, such as numpy.ndarray, tensorflow.Tensor, torch.Tensor, etc.

In other words, x, y, ... correspond to rows, columns, ...

Hope this helps! :)

However, I appreciate this may not be entirely clear from the documentation, so I will keep this issue open until this is resolved.

Thank you for your interest in RamanSPy!

If you find RamanSPy useful, please consider starring the project on GitHub! Your support means a lot!