denshoproject / ddr-cmdln

Command-line tools for automating the Densho Digital Repository's various processes.
Other
0 stars 2 forks source link

Explore using Marshmallow for objects #122

Open gjost opened 5 years ago

gjost commented 5 years ago

The DDR has its own code for loading (deserializing) and dumping (serializing) objects from/to JSON and CSV, but it's scattered through DDR.models and DDR.module. Another problem is the object content fields are added to the same namespace as object methods, rather than living in a .content or .source attribute.

Marshmallow (https://marshmallow.readthedocs.io/) is "an ORM/ODM/framework-agnostic library for converting complex datatypes, such as objects, to and from native Python datatypes".

gjost commented 5 years ago

Note: Marshmallow has no functions for CSV.

gjost commented 5 years ago

Found another interesting lib: SQLAthanor (https://sqlathanor.readthedocs.io/en/latest/). It extends SQLAlchemy’s ORM to support serialization/de-serialization for JSON, CSV, YAML, dict. We don't need SQL (might be useful though) but could it be used just for serialization/de-serialization?

The docs look good and the "SQLAthanor vs Alternatives" section (https://sqlathanor.readthedocs.io/en/latest/#sqlathanor-vs-alternatives) compares SQLAthanor with Marshmallow, Collander, and Pandas.

gjost commented 5 years ago

Hacked up some proof-of-concept scripts for Marshmallow and SQLAthanor.

Marshmallow looks easier and is less verbose but I didn't see an obvious way to use our DDR.converters functions, and it doesn't dump to CSV. TODO research CSV load/dump

SQLAthanor is usable without a database. It's much more verbose, but it does CSV and it's very easy to plug in the DDR.converters functions. However, it does NOT preserve field order! Fields in CSV and JSON dumps are in random or possibly alphabetical order, which is totally useless for our purposes. TODO would order be preserved in Python 3.7? TODO research adding load/dump OrderedDict