automeris-io / WebPlotDigitizer

Computer vision assisted tool to extract numerical data from plot images.
https://automeris.io
GNU Affero General Public License v3.0
2.63k stars 362 forks source link

Feature request: add option extract data using a constant Y step instead of a constant X step #232

Closed p-costa closed 3 years ago

p-costa commented 3 years ago

This may be quite useful for improving the data extraction from plots that show strong variations in X, but not in Y, specially when interpolation is also used (correct ?). An hacky way of doing it would be rotating the figure 90 degrees and swapping the axes, but this seems like a straightforward addition to the tool.

Thanks @ankitrohatgi for this awesome tool, by the way!

nbehrnd commented 3 years ago

Your suggestion does not read like a simple change of increments Δx and Δy

increment

which may be set independently from each other (compare with the data in test_set.zip from the default plot). Thus I suggest you add an illustration where your experience using the digitizer may be improved. If not by a .png easily embedded into a comment, you equally may share the file in question within an .zip archive.

p-costa commented 3 years ago

Thanks for your response. When one needs to use interpolation and smoothing in certain curves there the variations in X are very large, using a constant step in Y can be quite advantageous. I have sumbled accross an example on my work recently, but now I attach the default preview from webplotdigitizer, but rotated 90 degrees.

PS: In this particular example, other extractions will work quite well, but I am specifically talking about cases where smoothing is desired.

Screenshot from 2020-10-08 16-09-36

edit: to be clear, that's the result of me trying to capture the blue line using the X step w/ interpolation algorithm

nbehrnd commented 3 years ago

If the aim is / was to trace the blue curve, than the image displays only one point I would consider as match with this reference. The ondulation of the blue curve equally represents the problem to yield for many points along the abcissa more than one on the ordinate. I speculate it were nice if the digitizer could recognize points of inflection to increase in these regions the number of points to model the blue curve. The later could be resolved by running the digitizer sequentially on segments on the same curve (as mentioned e.g., here and here)

p-costa commented 3 years ago

So if I understand correctly it is a problem of having more than one possible solution for each value of X, and not how large the variation of it is. For reference, below is a simpler example of a plot where I also got much better results (extracting the data from the solid line) after rotating the figure 90 degrees, perhaps for the same reason. I think that for cases like this, having the choice of a Y step w/ interpolation would be nice.

Screenshot from 2020-10-08 19-49-07

nbehrnd commented 3 years ago

Indeed I do perceive the work with the digitizer the best if the relationship between points on abcissa and ordinate is strict bijective. If this is not the case, the safest way I identified was to perform the scrutinies in sections -- which is eased because one project may accommodate multiple datasets at once. And the digitizer does not mind if these data sets are all about «the blue curve», or (blue curve, orange one, etc.), it is up to the user to define start, end, increment and name of the individual trace.

As an example, I rotated the digitizer's default x-y plot by 90 degrees. Using the default Δx and Δy increment of 10 and normal pen width, I followed the blue line in 12 sections, percolating down the funnel. Prior to exporting the project I took screen photos of the individual sections, stitched them together into the following animated .gif to illustrate my processing:

all_together

Frame No 13 seen longer than the other twelve displays all sections simultaneously. The project's .tar, the exported .cvs and screen photos are shared (test_case.zip). Note that I took the liberty to have an abcissa running from reference point X1 on the left bottom corner (plus 2) to X2 on the right bottom corner (minus 2), while both references about Y are on the right vertical at 0 or 6.

Thus, if you have diagrams like the one in your comment by Oct 8, 7:55 PM UTC, I would seek to establish / restore the bijective relationship between abscissa and ordinate in first place. The freely available tool Imagemagick for example allows to turn the image quickly by an instruction on the CLI in the pattern of

convert input.png -rotate 90 output.png

(If you prefer a GUI, Xnconvert, or irfanview may offer this as well). The image shall then submitted to the digitizer, and the intermediate numeric results are then subject to a coordinate transformation. Thankfully, the transformation matrix is simplified because both source and target coordinate system are 2D and cartesian (example). This standard method, sometimes adjusting the increments for a curve to trace (which may differ from dataset to dataset within one project) offered me extractions which were good enough, thus I can't share experience with the interpolation of the points.

p-costa commented 3 years ago

Thanks for the detailed answer and suggestions. I played around and extracting the data box-by-box using the x-step with interpolation method seems to capture the data well as long as the section crosses lines of constant x only once. Finding the optimal rotation seems also like a nice suggestion, thanks for that.

I think that in the most common use cases one would just need to rotate by 90 degrees, so having the option for sweeping in Y instead of X would relieve that small effort.

ankitrohatgi commented 3 years ago

Hi - you can always pick "x" axis to be the one that is along the vertical direction and "y" as horizontal and use the interpolation without having to rotate the image etc. The X and Y axes don't need to be aligned to X and Y in pixels.

ankitrohatgi commented 3 years ago

Just tried on one of the above images:

image

vertical_plot.zip

p-costa commented 3 years ago

This actually does exactly what I was asking. Thank you! Closing.