eprbell / dali-rp2

DaLI (Data Loader Interface) is a data loader and input generator for RP2 (https://pypi.org/project/rp2), the privacy-focused, free, open-source cryptocurrency tax calculator: DaLI removes the need to manually prepare RP2 input files. Just like RP2, DaLI is also free, open-source and it prioritizes user privacy.
https://pypi.org/project/dali-rp2/
Apache License 2.0
63 stars 42 forks source link

Set ODS sheet size as function of number of rows in it #147

Closed qwhelan closed 1 year ago

qwhelan commented 1 year ago

dali currently outputs a large fraction of empty rows, which rp2 then processes individually and potentially outputs a log message per empty line. Assuming n transactions per asset and k assets, the current dali output is k * n + 40 rows per asset/sheet and a total of k^2 * n + 40k rows per document. Thus, only 1/k rows are populated and rp2 runtime explodes while doing useless work.

The above actually is the best case scenario, as it is even worse when the distribution of transactions per asset is unevenly distributed.

I have included a test case but it is a bit of a no-op as I did not want to touch the golden datasets and the existing code passes without this change. Please let me know if you have any suggestions on the preferred testing approach in this case.

In order to make mypy happy, I also updated the ezodf stubs.

qwhelan commented 1 year ago

@eprbell I implemented your suggested changes and CI is passing.

I'll probably take a stab at the rp2 version of this change but have some unsubmitted dali changes I'd like to focus on first.

qwhelan commented 1 year ago

@eprbell Updated the files you flagged but kept src/stubs/ezodf/document.pyi's header the same as git is treating it as a rename rather than a new file. Let me know if that one needs fixing.