Extend CSVDataFile to support hash index on Core file

csbrown commented 8 months ago

Description

It would be useful to be able to generalize iterators over the type of record. AFAIK, there's nothing particularly special about the Core Record data-format-wise, so it should be possible to create iterator methods that apply arbitrarily to Core or Extension files.

My particular use case is that I'm working on an iterator for JOINed files, and it's awkward to deal with the Core Record as a special entity, since nothing about the JOIN process requires knowledge of whether a file is Core or Extension.

Deliverables:

1) Extend the CSVDataFile class to allow a hash index on the Core Record.

This seems like it's maybe a 3 line change in _build_coreid_index to just if/then over what kind of row it's inspecting.
Maybe also need to edit some type hints 2) Update the test_coreid_index test to also build an index on the Core Record.

csbrown commented 8 months ago

Advice on unknown unknowns w.r.t. difficulties in implementing this appreciated.

niconoe commented 7 months ago

Thanks a lot, @csbrown. The corerow / extensionrow inconsistencies have bothered me for a long time (I wasn't very experienced with Python when I wrote this code), so it's great to see someone tackling a first step to improve it!

csbrown commented 7 months ago

"I wasn't very experienced..."

You've done a great job. I appreciate your hard work. :)

BelgianBiodiversityPlatform / python-dwca-reader

Extend CSVDataFile to support hash index on Core file #100

Description

Deliverables: