Using GBIF Downloads, it has been noticed that looping on the archive was incredibly slow when there's a large verbatim.txt data file in addition to the main file. This continue even if we truncate the main occurrence.txt file to 10 records or so.
Reason is easy to identify: there's a design problem in CoreRow's constructor: an _EmbeddedCSV instance is created for each CoreRow. Creating an _EmbeddedCSV is pretty expensive (_line_offsets attribute, mainly), so it should be only done one per archive.
Using GBIF Downloads, it has been noticed that looping on the archive was incredibly slow when there's a large verbatim.txt data file in addition to the main file. This continue even if we truncate the main occurrence.txt file to 10 records or so.
Reason is easy to identify: there's a design problem in CoreRow's constructor: an _EmbeddedCSV instance is created for each CoreRow. Creating an _EmbeddedCSV is pretty expensive (_line_offsets attribute, mainly), so it should be only done one per archive.