Set ODS sheet size as function of number of rows in it

qwhelan commented 1 year ago

dali currently outputs a large fraction of empty rows, which rp2 then processes individually and potentially outputs a log message per empty line. Assuming n transactions per asset and k assets, the current dali output is k * n + 40 rows per asset/sheet and a total of k^2 * n + 40k rows per document. Thus, only 1/k rows are populated and rp2 runtime explodes while doing useless work.

The above actually is the best case scenario, as it is even worse when the distribution of transactions per asset is unevenly distributed.

I have included a test case but it is a bit of a no-op as I did not want to touch the golden datasets and the existing code passes without this change. Please let me know if you have any suggestions on the preferred testing approach in this case.

In order to make mypy happy, I also updated the ezodf stubs.

qwhelan commented 1 year ago

@eprbell I implemented your suggested changes and CI is passing.

I'll probably take a stab at the rp2 version of this change but have some unsubmitted dali changes I'd like to focus on first.

qwhelan commented 1 year ago

@eprbell Updated the files you flagged but kept src/stubs/ezodf/document.pyi's header the same as git is treating it as a rename rather than a new file. Let me know if that one needs fixing.

eprbell / dali-rp2

Set ODS sheet size as function of number of rows in it #147