Open MartinThoma opened 10 years ago
Currently, the data storage format for raw data is a dictionary
handwriting_datasets = [{'handwriting': handwriting, 'id': raw_data['id'], 'formula_id': formula['id'], 'formula_in_latex': formula['formula_in_latex'], 'is_in_testset': raw_data['is_in_testset']}, ...] raw_data = {'handwriting_datasets': handwriting_datasets, 'formula_id2latex': formula_id2latex}
But the handwriting already contains the id, the formula_id, the formula_in_latex. The information is_in_testset could easily be added. Then it would simply be a list of objects instead of a list of dictionaries with various datatypes.
handwriting
id
formula_id
formula_in_latex
is_in_testset
I would have to adjust:
raw_dataset['handwriting'].raw_data_id
raw_dataset.raw_data_id
Currently, the data storage format for raw data is a dictionary
But the
handwriting
already contains theid
, theformula_id
, theformula_in_latex
. The informationis_in_testset
could easily be added. Then it would simply be a list of objects instead of a list of dictionaries with various datatypes.I would have to adjust:
raw_dataset['handwriting'].raw_data_id
->raw_dataset.raw_data_id