chrisosaurus / dodo

scriptable in place file editor
MIT License
4 stars 1 forks source link

consider regex matching #7

Open chrisosaurus opened 9 years ago

chrisosaurus commented 9 years ago

Dodo used (read: stole) the e// notation from the ed school of syntax, most other users of this syntax support regex matching and this was always a potential feature for dodo.

PCRE is probably too much but POSIX regex should be sufficient.

Regex search is probably too expensive as in the failure case it would have to go through the whole file, but an anchored regex match (match from current position or die, similar to `expect) could be quite useful.

A potential issue here is of course that any multiple-character matches could be dangerous m/a.*z/, especially if they are greedy by default.

We could make .* NOT match newline characters, but this only helps in the case of files being newline delimited and I think we need to consider the case of operating on large files that lack newlines (however we could just push this issue onto the end user).

phillid commented 9 years ago

If we use

m/Jack and.*Jill/
w/bar/

on the text "Jack and his close childhood friend, Jill", what is the expected output?

Since this is an in-place editor, I'd personally expect "bark and his close childhood friend, Jill" -- is this thinking correct?

Apart from that, matches like m/J... and J... had [0-9] pails/ would be easy/safe enough to implement.

chrisosaurus commented 9 years ago

@phillid I think your interpretation is correct regarding "bark and his close childhood friend, Jill", thanks for the great concrete examples.

For me the important part is to never have a regex search (as I don't want to be performing a text search across a 14G file), only ever a 'match' which is anchored to the current location.


To spitball / scope creep a little:

There is another interesting case in that if we have the sentence "Jack and his close childhood friend, Jill" and we want to replace Jill ONLY when preceded with a sentence matching a pattern

m/Jack and.*Jill/

is insufficient as it doesn't move the cursor, and there is no current way to know the byte offset from the cursor to the start of Jill (as we lack a search, on purpose).

It might be worth later considering a syntax for specifying where within the regex the cursor should be placed after the match, but would require careful thought around the notation used (so we could strip it out before regex matching)


But that is for later, for now the focus should be getting a concrete implementation of the basic m/pattern/ system

For now it should be sufficient to mock up a wrapper around the posix functions regcomp and regexec, later on we could consider migrating to re2 (https://github.com/google/re2/)

Thanks @phillid for the great work, I should have some time this weekend to start on this but you are also welcome to dive in first.

chrisosaurus commented 9 years ago

'This weekend' he said 26 days ago, sorry I have delayed with other things and wont be able to get around to this for a little while yet.