automeris-io / WebPlotDigitizer

Computer vision assisted tool to extract numerical data from plot images.
https://automeris.io
GNU Affero General Public License v3.0
2.63k stars 363 forks source link

Bug: Non-Linear Axis in Spectroscopy Plot #246

Closed MaximilianRothstein closed 3 years ago

MaximilianRothstein commented 3 years ago

Hello, I am using WebPlotDigitizer to digitize an infrared (IR) spectroscopy plot.

The wavenumbers are plotted on the x-axis. It first appears to be linear (i.e. the step between 4000 cm-1 and 3000 cm-1 is the same distance as 2000 cm-1 to 1000 cm-1, thus making WebPlotDigitizer work). However, (due to tradition), this axis is non-linear and leads to imprecise and wrong extractions. The y-axis is not affected.

Screenshot 1 at 4000 Screenshot 2 at 500

In the screenshots above, the x-values towards the calibration points (4000 and 500) are precise.

However, if a point is selected in the middle of the axis, the x-value determined by WebPlotDigitizer is dramatically incorrect (i.e. it should be 2000, but WebPlotDigitizer erroneously thinks it is 2600).

Screenshot 3 at 2000, showing 2600

This mismatch negatively influences the dataset.

I am plotting the results with R, and maybe I can look into setting the axis more precisely in R, i.e. shifting the axis.

However, a possible fix would be to add more calibration points, i.e. let the user add more calibration points.

nbehrnd commented 3 years ago

SDBS is not the sole source freely available on the net with experimental IR data. A complementary database for data of this type already digitized is the NIST webbook chemistry, which for this compound offers visual representation as .svg as well as JCAMP-DX data in the typical mid-infrared (4000 -- 670 wavenumbers) as well what is considered THz.

I speculate (because of cheminformatics' relevance to IR spectroscopy), there are multiple modules in R to work with this format right away. If not, and if there is no need to process batches of files into «x-y format», Norbert Haider (University of Vienna) wrote a lightweight IRView program to inspect and translate JCAMP-DX into a tabulator separated format (Windows executable); which extends functionalities already present in his jdxview. (Compartmentalized, it equally is possible to run this gem in Linux and the wine libraries, too.)

jko.zip

MaximilianRothstein commented 3 years ago

Thank you for the links. I am aware that the NIST Chemistry Webbook has some plots already digitized. However, not for all substances, e.g. there is no IR spectrum for isobutyl acetate from the Webbook available.

nbehrnd commented 3 years ago

Compared to a spectral database with access to the data recorded, the visual representation as a .gif by SDBS lacks optical resolution. SDBS probably was not intended to be both public and offering more than a brief preview (which already is a lot) to limit bandwidth. Which equally could be a reason why the limit is set to 50 sets per day.

I may have a look on the first IR later to check if splitting the .gif into two separate projects and subsequent use of the box-method yields a better result, than the default mask.


Independent from this, if affiliated with a university, it would be worth to check if they have access to spectral databases. They could be:

nbehrnd commented 3 years ago

Given the constraints, the result of a quick digitization does not look too bad (for me). Maybe decrease the step width further to 1 px along abscissa and ordinate, maybe smooth the appearance by a SG filter, etc.:

concatenate

The archive below documents my approach.

SDBS_532.zip

MaximilianRothstein commented 3 years ago

Thank you for the suggestions and the approach of splitting the data.

I was able to correct the x-axis in my R plot by first removing the x-axis and then replacing it with defined ticks. WebPlotDigitizer confirmed my calculations, which I first did by hand and now the plot has a correct x-axis.