automeris-io / WebPlotDigitizer

Computer vision assisted tool to extract numerical data from plot images.
https://automeris.io
GNU Affero General Public License v3.0
2.63k stars 363 forks source link

Support digitization of pie charts #7

Open vinodkhare opened 12 years ago

vinodkhare commented 12 years ago

Automatically extract percentages from a pie chart.

ankitrohatgi commented 12 years ago

Yes, Pie charts would be nice. They are somewhat trickier than others though, since the slices make little sense without the corresponding labels. In case of other plots, the "labels" are simply some other number (ex: x-axis for 2D plots).

I can think of three ways of doing this: 1) Just pick off the slices and label them 1, 2, 3 etc. and let the user decide what is what. 2) Ask the user for the labels. This can get pretty tedious if you have a large chart. 3) Do some text extraction, but it's difficult to make a reliable one. However, this could be useful in many other places as well.

The point is that for simple pie charts, it's quite easy to read off the values anyways. And for large complicated ones, I haven't figured out a way to automate the process. Got any suggestions?

vinodkhare commented 12 years ago

The reason I made this feature request was because I came across a pie chart that had the labels but not the values. Hence, the need for digitization.

I'd prefer option 1 to begin with. Maybe later we could give the user and option to edit the labels.

aizenman commented 10 years ago

I think option #1 would work for the vast majority of users - it's pretty easy to manually label in excel since they are naturally in strict order.

ehtec commented 2 years ago

@ankitrohatgi

Hey guys, I have created something that might be of interest for you...

https://git.ehtec.co/research/pie-chart-ocr

This python library can extract data from pie charts. It does not do any percentage calculation from sector angles, as I see some other (inaccurate) tools doing it. I don't see the need for that, as every decent pie chart has the percentage numbers next to the sectors. I rarely come across one which doesn't.

The chart is supplied as an image, from which text is extracted via OCR and MSER. Already supported is:

What is planned to be added in the near future:

This project is MIT licensed, feel free to make use of it. I am very busy at the moment, if anybody is interested in this feature and wants to help with the development of this, please let me know. The hardest work is already done.

My email address is elias.hohl@ehtec.co.