automeris-io / WebPlotDigitizer

Computer vision assisted tool to extract numerical data from plot images.
https://automeris.io
GNU Affero General Public License v3.0
2.58k stars 354 forks source link

Dataset name missing from Y columns in CSV header #324

Open Entropy512 opened 2 months ago

Entropy512 commented 2 months ago

Currently, CSVs are exported in a manner that would make sense for a format that supports merged cells, but CSV does not support this. Instead, the dataset name should be included for every column, not just the X columns. The current format breaks the ability to load a dataset with Python Pandas' load_csv() function into a MultiIndex dataframe, for example.

Currently, the CSV header looks like this:

Red,,Green,,Blue,
X,Y,X,Y,X,Y

To make the headers easier to parse, it should look like this:

Red,Red,Green,Green,Blue,Blue
X,Y,X,Y,X,Y

The latter format loads into a Pandas MultiIndex dataframe easily