Open nigelcharman opened 5 months ago
We're now using enumerate(dwca)
so we're in no rush to have this corrected. I'll leave the issue open though in case other people come across it.
Note to self: it only happens with the combination of chunksize
(and probably also the iterator
parameter) and the DwCA using default values (because pd_read
returns a TextFileReader
rather than a regular data frame)
After careful inspection I can't see any sane way to deal with this specific combination (pd_read
returning TextFileReader objects because of its parameters and the DwC-A using default values).
I therefore decided to document the incompatibility + add a human readable exception for that situation. This is also tested.
Would it be worth adding a note to https://python-dwca-reader.readthedocs.io/en/latest/pandas_tutorial.html too? It was this documentation that led me to believe that this combination might be possible.
We've been using
python-dwca-reader
with no problems loading about 13k occurrences. We now need to scale it up to load about 3.25m occurrences.Changing the code from:
to:
causes the error:
Looking at
gbif-alert
, I see that you're usingenumerate(dwca)
rather than reading it in chunks, so I'll give that a try.