bradyt / cone

A ledger.
https://cone.tangential.info
20 stars 5 forks source link

Second step of parser is slow #63

Open bradyt opened 4 years ago

bradyt commented 4 years ago

There are two steps in the parser currently. We use PetitParser to split the file into blocks and their line numbers, and then apply RegExp to the blocks to get transactions, directives and comments. Inserting timestamps for a larger ledger file, reveals that the second step takes quite a long time.

There are two reasons we might not fix this yet.

  1. Parsing in two steps allows us some flexibility in how we parse transactions and directives. We might want to experiment with supporting different ledger formats. (I wonder if using PetitParser for the second step would speed things up. We could experiment to help determine this, with more print statements and some prototype parsing.)

  2. I have found using PetitParser can be a little tricky to get right. We might consider the current state of the first step of the parser as somewhat stable. The idea of splitting into blocks is a pretty simple one. If we have a stable behavior in how line numbers are derived, it could make it easier to start work on the delete/edit feature requested here: https://github.com/bradyt/cone/issues/25.

The slowness for large files falls under the general issue raised at https://github.com/bradyt/cone/issues/22, which I mostly filed saying that we need to look out for such issues as they arise.

Version: commit with hash d91cf74

bradyt commented 4 years ago

A draft of moving the slow parser step to an isolate, so that app doesn't hang, is at https://github.com/bradyt/cone/commit/609db2a755e182c3850d3cf9e445d5178ec6954c. I just need to get around to fixing so that it succeeds on CI tests. Linting, etc.