Closed chipsenkbeil closed 4 years ago
Still need to maintain at least an offset, which would involve a custom implementation of traits (ugh). Main reason to not use nom_locate
is the upfront cost when slicing. If we calculate the line & column for an offset at the end, we can just run through the entire string one last time.
Challenge is knowing when at beginning of a line, which is a parser I have and use for block elements.
Nearly done. Span is implemented and at first performance tanked, but turns out that this was in big part due to calls to find the current line and column, which is expensive and may even be thrown away.
While I'm not sure how common the throw away is, I do know that it is expensive. With #57 (AST), we have control over an intermediate type that maintains the offset (similar to LocatedElement
) and when transforming the AST into the actual elements, we calculate the line and column at that point.
The advantage here is that we could provide some sort of state-based converter that walks through the input, keeping track of line locations relative to their offsets.
Sped things up. Main strength was control over not calling line/column calculations. Since I know vim supports retrieving a byte position, I think the easiest thing to do is support a byte offset for lookup instead of line/column position. Much, MUCH cheaper.
There are a couple of reasons:
Main reasons for custom input:
As it turns out, other real languages like Rust treat non-doc comments like some form of whitespace. I hadn't even considered identifying comment regions and replacing with whitespace.
Additionally, as long as we have the original input, we can go back and compute line and column information based on an offset. As seen with
nom_locate
, you need a little unsafe code to go back to the beginning of a fragment using its offset. There are some optimizations we could do such as pre-computing the newline positions. but we'd need to pass in some extra information.