BelgianBiodiversityPlatform / python-dwca-reader

🐍 A Python package to read Darwin Core Archive (DwC-A) files.
BSD 3-Clause "New" or "Revised" License
43 stars 21 forks source link

Extend CSVDataFile to support hash index on Core file #100

Closed csbrown closed 7 months ago

csbrown commented 8 months ago

Description

It would be useful to be able to generalize iterators over the type of record. AFAIK, there's nothing particularly special about the Core Record data-format-wise, so it should be possible to create iterator methods that apply arbitrarily to Core or Extension files.

My particular use case is that I'm working on an iterator for JOINed files, and it's awkward to deal with the Core Record as a special entity, since nothing about the JOIN process requires knowledge of whether a file is Core or Extension.

Deliverables:

1) Extend the CSVDataFile class to allow a hash index on the Core Record.

csbrown commented 8 months ago

Advice on unknown unknowns w.r.t. difficulties in implementing this appreciated.

niconoe commented 7 months ago

Thanks a lot, @csbrown. The corerow / extensionrow inconsistencies have bothered me for a long time (I wasn't very experienced with Python when I wrote this code), so it's great to see someone tackling a first step to improve it!

csbrown commented 7 months ago

"I wasn't very experienced..."

You've done a great job. I appreciate your hard work. :)