MI-DPLA / combine

Combine /kämˌbīn/ - Metadata Aggregator Platform
MIT License
26 stars 11 forks source link

exporting Indexed Fields as Tabular Data #180

Closed ghukill closed 6 years ago

ghukill commented 6 years ago

In addition to exporting Records as XML (#179), it will be helpful to allow the exporting of indexed fields as well. As this has been indexed in ES, this is tabular data.

For now, focus on a single Job, but could extend to similar scopes as Record XML exports.

The es2csv is a nice option, and fits the bill. However, it requires a python 2.7 environment. This would likely get added to the build, unless it can get shoehorned into python 3.x.

And that may be possible: forked a PR to the original repo that would work with 3.5: https://github.com/WSULib/es2csv

Then, with that installed, exporting an ES index is thusly:

es2csv -q '*' -i 'j17' -D 'record' -o 'j17_py3.csv'

When a field is multivalued, numbers are affixed:

foo.0
foo.1
foo.2

This makes for handy import into OpenRefine. However, looks as though the -k -- Kibana style -- flag will support comma delimited fields as well (these options could be bubbled up to GUI):

es2csv -q '*' -i 'j17' -D 'record' -o 'j17_py3_kibana.csv' -k

and results in values like:

mods_subject_genre --> "Poetry,Juvenile poetry"
ghukill commented 6 years ago

Finis!