MartinThoma / hwrt

A toolset for handwriting recognition
MIT License
69 stars 17 forks source link

Simplify data storage format of raw data #12

Open MartinThoma opened 10 years ago

MartinThoma commented 10 years ago

Currently, the data storage format for raw data is a dictionary

handwriting_datasets = [{'handwriting': handwriting,
                         'id': raw_data['id'],
                         'formula_id': formula['id'],
                         'formula_in_latex':
                            formula['formula_in_latex'],
                         'is_in_testset':
                            raw_data['is_in_testset']}, ...]
raw_data = {'handwriting_datasets': handwriting_datasets,
            'formula_id2latex': formula_id2latex}

But the handwriting already contains the id, the formula_id, the formula_in_latex. The information is_in_testset could easily be added. Then it would simply be a list of objects instead of a list of dictionaries with various datatypes.

I would have to adjust: