beancount / beangulp

Importers framework for Beancount
GNU General Public License v2.0
59 stars 23 forks source link

Deduplicator doesn't support entries with price and inferred amount #98

Open henrikssn opened 2 years ago

henrikssn commented 2 years ago

I have this in my ledger:

2016-09-25 * "Hooli Vest Event"
  Assets:US:Schwab:HOOL               1 HOOL {786.9 USD}
  Income:Salary:Hooli:GSU     CHF @ 0.968892 USD
  Assets:US:Hooli:Unvested:C123456  -1 HOOL.UNVEST
  Expenses:Hooli:Vested              1 HOOL.UNVEST

Exception:

  Traceback (most recent call last):
    File "/home/erik/.local/lib/python3.7/site-packages/beangulp/__init__.py", line 86, in _extract
      entries = extract.extract_from_file(importer, filename, existing_entries)
    File "/home/erik/.local/lib/python3.7/site-packages/beangulp/extract.py", line 44, in extract_from_file
      entries = importer.deduplicate(entries, existing_entries)
    File "/home/erik/.local/lib/python3.7/site-packages/beangulp/importer.py", line 167, in deduplicate
      return extract.mark_duplicate_entries(entries, existing, window, self.cmp)
    File "/home/erik/.local/lib/python3.7/site-packages/beangulp/extract.py", line 125, in mark_duplicate_entries
      if compare(entry, target):
    File "/home/erik/.local/lib/python3.7/site-packages/beangulp/importer.py", line 145, in cmp
      return compare(a, b)
    File "/home/erik/.local/lib/python3.7/site-packages/beangulp/similar.py", line 101, in __call__
      amounts1 = self.cache[id(entry1)] = amounts_map(entry1)
    File "/home/erik/.local/lib/python3.7/site-packages/beangulp/similar.py", line 151, in amounts_map
      amounts[key] += posting.units.number
  TypeError: unsupported operand type(s) for +=: 'decimal.Decimal' and 'type'
dnicolodi commented 2 years ago

Interesting, I didn't know that this is valid Beancount syntax. To clarify, the ledger snippet above is emitted by your importer or is it in the existing ledger?

henrikssn commented 2 years ago

The snippet above is in my existing ledger (both v2 and v3 accepts it), and it is actually quite tricky to fill in the blank as there is a currency conversion involved.

dnicolodi commented 2 years ago

Then I don't understand why the deduplication code sees the MISSING value: the booking code should take care of filling in the missing value. A quick test indeed confirms that using the snippet above as existing ledger in beangulp extract downloads/ -e existing.beans works just fine.

Do you invoke the deduplication code in a custom way that uses parse_file() instead than load_file() to parse the existing ledger? From the traceback you reporter, it would not seem so. I'm confused. Can you post a more exact reproducer?

blais commented 2 years ago

What's the content of your accounts' inventories just before the transactions? (use bean-doctor context)

henrikssn commented 2 years ago

Here is a minimal ledger which reproduces the issue:

2016-09-25 open Assets:US:Schwab:HOOL HOOL
2016-09-25 open Income:Salary:Hooli:GSU CHF
2016-09-25 open Assets:US:Hooli:Unvested:C123456 HOOL.UNVEST
2016-09-25 open Expenses:Hooli:Vested HOOL.UNVEST

2016-09-25 * "Hooli Grant C123456"
  Income:US:Hooli:Awards                -100 HOOL.UNVEST
  Assets:US:Hooli:Unvested:C123456      100 HOOL.UNVEST

reproduce_issue98.py

import dateutil.parser

from beancount.core.amount import Amount
from beancount.core.data import EMPTY_SET, new_metadata, Posting, Transaction
from beancount.core.number import Decimal, MISSING
from beancount.core.position import Cost

import beangulp
from beangulp import mimetypes
from beangulp.cache import cache
from beangulp.testing import main

class Importer(beangulp.Importer):
  """Reproduce beancount/beangulp/issues/98."""

  def __init__(self):
    pass

  def identify(self, filepath):
    if not "reproduce_issue98.csv" in filepath.lower():
        return False
    return True

  def filename(self, filepath):
    return 'reproduce_issue98.csv'

  def account(self, filepath):
    return None

  def date(self):
    return dateutil.parser.parse("2016-09-25").date()

  def extract(self, filepath, existing):
    date = dateutil.parser.parse("2016-09-25").date()
    return [
      Transaction(
      meta=new_metadata(filepath, 0),
      date=date,
      flag="*",
      payee="",
      narration=f"Hooli Vest Event",
      tags=EMPTY_SET,
      links=EMPTY_SET,
      postings=[
        Posting("Assets:US:Hooli:GOOG", Amount(Decimal("1"), "GOOG"), Cost(Decimal("786.9"), "USD", date, None), None, None, None),
        Posting("Expenses:Hooli:Vested", Amount(Decimal("1"), "GOOG.UNVEST"), None, None, None, None),
        Posting("Assets:US:Hooli:Unvested:C123456", -Amount(Decimal("1"), "GOOG.UNVEST"), None, None, None, None),
        Posting(f"Income:Salary:TY2016:Google:GSU", Amount(MISSING, "CHF"), None, Amount(Decimal("0.968892"), "USD"), None, None),
        ])
    ]

if __name__ == '__main__':
    importer = Importer()
    main(importer)

Commands to reproduce:

touch testdata/reproduce_issue98.csv
python3 importers/reproduce_issue98.py extract -e test.beancount testdata

bean-doctor output:

$ bean-doctor context test.beancount 6
** Transaction Id --------------------------------

Hash:38ae45045f6a38c872e50ee91eb452e7
Location: /config/beancount/test.beancount:6

** Balances before transaction --------------------------------

  Income:US:Hooli:Awards                                                             

  Assets:US:Hooli:Unvested:C123456                                                   

** Unbooked Transaction --------------------------------

2016-09-25 * "Hooli Grant C123456"
  Income:US:Hooli:Awards            -100 HOOL.UNVEST
  Assets:US:Hooli:Unvested:C123456   100 HOOL.UNVEST

** Transaction --------------------------------

2016-09-25 * "Hooli Grant C123456"
  Income:US:Hooli:Awards            -100 HOOL.UNVEST
  Assets:US:Hooli:Unvested:C123456   100 HOOL.UNVEST

** Residual and Tolerances --------------------------------

** Balances after transaction --------------------------------

* Income:US:Hooli:Awards                                             -100 HOOL.UNVEST

* Assets:US:Hooli:Unvested:C123456                                    100 HOOL.UNVEST
henrikssn commented 2 years ago

Interesting, I didn't know that this is valid Beancount syntax. To clarify, the ledger snippet above is emitted by your importer or is it in the existing ledger?

To clarify, the snippet did exist in both the ledger and the import output, but it seems to be the latter which causes the issue (see above).

dnicolodi commented 2 years ago

Thank you for the reproducer. The problematic transaction containing the MISSING amount is indeed the one produced by the importer. Now things make sense. I'll thing about what's the best strategy to handle those in the deduplication code.