eprbell / rp2

Privacy-focused, free, open-source cryptocurrency tax calculator for multiple countries: it handles multiple coins/exchanges and computes long/short-term capital gains, cost bases, in/out lot relationships/fractioning, and account balances. It supports FIFO, LIFO, HIFO and it outputs in form 8949 format. It has a programmable plugin architecture
https://pypi.org/project/rp2/
Apache License 2.0
256 stars 42 forks source link

Incorrect lot exhaustion in HIFO #73

Closed debben closed 1 year ago

debben commented 1 year ago

Hello. I think I may have found an issue in the hifo plugin where lower cost-basis lots seem to be selected when there exist higher, partially depleted lots.

Testing Methodology

I've been spending the last week or so getting familiar with rp2+dali-rp2. I've been working to take the transaction history csv report from Coinbase as an input to RP2. My input data consists mostly of buys and some swaps on coinbase for the tax year 2021. For 2021 I also have the PDF copy of my Gain-Loss report from Coinbase which defaults to HIFO. My goal was to get to a state where the output of RP2 matched this PDF. I'm not necessarily taking the Gain-Loss report from Coinbase as perfect, but just as a baseline output of what HIFO should look like from a non-RP2 HIFO implementation.

When I compare the PDF to the report against the RP2 report, the proceeds, cost basis, and lots selected for each taxable event match up exactly for the first few months of transactions, but then diverge. It looks as though there is a lot from earlier in the year which is 80% consumed by the first few months of transactions and stops being consumed in favor of higher cost-basis transactions acquired later. The remainder isn't picked up even when that remainder is the highest-cost lot available for an event.

I think the issue has to do with the method _seek_first_non_exhausted_acquired_lot and how this traverses backwards to find an unsued lot. The only validation seems to be that the lot has value, but not necessarily that the lot is the one aquired for the highest cost. I've made a branch here with a few line change I made to get the RP2 report to match my Coinbase Gain-Loss PDF https://github.com/eprbell/rp2/compare/main...debben:rp2:hifo-lot-fix

If this makes sense I can submit as a PR. I wanted to first open an issue though to get confirmation that what I'm seeing is an issue and not just me using RP2 incorrectly.

eprbell commented 1 year ago

Thanks for digging deep into the code, analyzing it and proposing fixes!

I haven't looked carefully at your code yet, but from a first glance, you may be on to something: I will be studying your diff in detail in the next few days and follow up here.

Here are a few initial observations. I ran a simple experiment (which hints at the fact that you may have found a real bug in HIFO): I applied your patch to my local tree and ran pytest

With your patch, the following tests fail (the others succeed): tests/test_large_input.py::TestLargeInput::test_large_input_rp2_full_report tests/test_large_input.py::TestLargeInput::test_large_input_tax_report_us tests/test_ods_output_diff.py::TestODSOutputDiff::test_test_data2_rp2_full_report tests/test_ods_output_diff.py::TestODSOutputDiff::test_test_data2_tax_report_us

These tests simply run RP2 on a given test input file with a given accounting method and then compare the output against a "golden" version of the output (the accounting methods are iterated so that all files are tested with all accounting methods).

Note that:

Back when I was working on accounting methods, I verified them by picking a few meaningful test input files, computing results manually using the given accounting method, and diffing manual results against generated results (I couldn't do this for all input files and all methods because it's too much manual computation). For HIFO I used the test_hifo* files (which in fact work also with your patch), but I didn't compute test_data2 manually with HIFO (I did with LIFO and FIFO).

So seeing that with your patch the test_data2 test failed with HIFO raised a red flag. From an initial quick look, I think there may be a real bug in HIFO: in sheet B4 Tax, line 37 the original code seems to be picking the acquired lot priced at $12000 instead of the one priced at $13000 (your patch picks the $13000 one).

Anyway this is just some initial thinking: I will do a full manual computation on test_data2 with HIFO and I'll be able to tell exactly if and where there is a bug.

In any case, thanks again for pointing out a potential problem and working on a potential solution. Regardless of the results this is indeed pointing to something worth exploring more in the HIFO logic.

eprbell commented 1 year ago

I verified it's indeed a HIFO bug, as you suggested: as mentioned above the test_data2 (HIFO variant) generates different results with your patch vs without it and I manually verified that the result generated using your patch is correct. The problem is indeed visible in sheet B4 Tax, line 37: the original code seems to be picking the acquired lot priced at $12000 instead of the one priced at $13000, even though the $13000 one is not exhausted.

Can you open a PR? We'll do a code review and then merge it.

Again, thanks for reporting a bug and proposing a fix!