adrian-thurston / colm

The Colm Programming Language
MIT License
167 stars 32 forks source link

[colm] regex subexpression capture #43

Open adrian-thurston opened 5 years ago

adrian-thurston commented 5 years ago

This may be finished. Need to investigate.

adrian-thurston commented 5 years ago

Experimented with a solution once, but certainly not complete in latest.

adrian-thurston commented 4 years ago

Would be great to have this when parsing Go. There we are using a regex to decide when to insert semi colons. Looks like:

    token insert_semi /
        ( 
            ( id - 'if' - 'then' - 'else' - 'end' )
        )
        [ \t]*
        ( line_comment | '\n' )
    /
    {
        parse BA: break_apart[$match_text]

        Prefix: str = input->pull( BA.pre_semi.data.length )
        input->push( ";" )
        input->push( Prefix )
    }

Note that we need to parse the match text again just to learn where to insert the semi. If we had subexpression match we could use the length of that match to decide how much to pull off before pushing the semi.