Closed MaximilianRothstein closed 3 years ago
SDBS is not the sole source freely available on the net with experimental IR data. A complementary database for data of this type already digitized is the NIST webbook chemistry, which for this compound offers visual representation as .svg as well as JCAMP-DX data in the typical mid-infrared (4000 -- 670 wavenumbers) as well what is considered THz.
I speculate (because of cheminformatics' relevance to IR spectroscopy), there are multiple modules in R to work with this format right away. If not, and if there is no need to process batches of files into «x-y format», Norbert Haider (University of Vienna) wrote a lightweight IRView program to inspect and translate JCAMP-DX into a tabulator separated format (Windows executable); which extends functionalities already present in his jdxview. (Compartmentalized, it equally is possible to run this gem in Linux and the wine libraries, too.)
Thank you for the links. I am aware that the NIST Chemistry Webbook has some plots already digitized. However, not for all substances, e.g. there is no IR spectrum for isobutyl acetate from the Webbook available.
Compared to a spectral database with access to the data recorded, the visual representation as a .gif by SDBS lacks optical resolution. SDBS probably was not intended to be both public and offering more than a brief preview (which already is a lot) to limit bandwidth. Which equally could be a reason why the limit is set to 50 sets per day.
I may have a look on the first IR later to check if splitting the .gif into two separate projects and subsequent use of the box-method yields a better result, than the default mask.
Independent from this, if affiliated with a university, it would be worth to check if they have access to spectral databases. They could be:
in print (e.g., Sadtler, Aldrich, Hummel) so a high-resolution scan may offer better «raw data» for the digitizer. Complementary: The compilations by Pretsch idealize bands by a range, then indicate specific, experimentally determined positions.
they could be electronic for free by the chemical suppliers (e.g., Aldrich), or part of the spectrometer's software. Speaking of the former, with name / structure / CAS RN you may need to look for multiple entries per CAS RN to identify /the/ product number with Aldrich's recorded IR / Raman / NMR in the lower section «Documents».
Example isobutyl acetate with product number W217506 at https://www.sigmaaldrich.com/catalog/product/aldrich/w217506?lang=en®ion=US to yield link https://www.sigmaaldrich.com/spectra/rair/RAIR000962.PDF
they could be in a database requiring subscription. Ask your librarian / spectroscopist / analytical chemist about e.g., SpecInfo; or databases organized as pointer to the data like Know-it-all (BioRad), SpringerMaterials, Reaxys (Elsevier), CAS.
Given the constraints, the result of a quick digitization does not look too bad (for me). Maybe decrease the step width further to 1 px along abscissa and ordinate, maybe smooth the appearance by a SG filter, etc.:
The archive below documents my approach.
Thank you for the suggestions and the approach of splitting the data.
I was able to correct the x-axis in my R plot by first removing the x-axis and then replacing it with defined ticks. WebPlotDigitizer confirmed my calculations, which I first did by hand and now the plot has a correct x-axis.
Hello, I am using WebPlotDigitizer to digitize an infrared (IR) spectroscopy plot.
The wavenumbers are plotted on the x-axis. It first appears to be linear (i.e. the step between 4000 cm-1 and 3000 cm-1 is the same distance as 2000 cm-1 to 1000 cm-1, thus making WebPlotDigitizer work). However, (due to tradition), this axis is non-linear and leads to imprecise and wrong extractions. The y-axis is not affected.
In the screenshots above, the x-values towards the calibration points (4000 and 500) are precise.
However, if a point is selected in the middle of the axis, the x-value determined by WebPlotDigitizer is dramatically incorrect (i.e. it should be 2000, but WebPlotDigitizer erroneously thinks it is 2600).
This mismatch negatively influences the dataset.
I am plotting the results with R, and maybe I can look into setting the axis more precisely in R, i.e. shifting the axis.
However, a possible fix would be to add more calibration points, i.e. let the user add more calibration points.