Open FObersteiner opened 2 years ago
@FObersteiner, I agree that we should look at this. Do you have publicly downloadable large example files that we could use in unit/integration testing?
@agstephens jup, I was about to create some public sample data from our ozone instruments anyway ;-) you can find them here: https://git.scc.kit.edu/FObersteiner/pyFairoproc/-/tree/master/samples.
The one that's problematic in this context (nappy reading data) is the cl_photometer file (~86k lines of data, just one variable).
Description
Loading data from small files completes in a decent amount of time. With many lines of data (10k+), the process becomes a "bottleneck".
What I Did
read 4.3k lines of data, ffi1001:
read 86.6k lines of data, ffi1001:
That's nearly a minute per file! If I'd want to load many such files, I'd have to go have a lot of coffee in the meantime ☕👾
tracing the execution of the call to
readData
, I findreadItemsFromUnknownLines
(ok)