djr7C4 / cl-lex

Common Lisp macros for generating lexical analyzers automatically
GNU General Public License v3.0
8 stars 2 forks source link

way to add user variables within lexer action blocks #2

Closed erjoalgo closed 8 years ago

erjoalgo commented 8 years ago

define-string-lexer doesn't seem to provide a way to add syms that can be used within the action blocks. if I wanted to eg keep track of a "comment-level" variables or "inside-string" indicators that are specific to the closure, is there a way to do this?

djr7C4 commented 8 years ago

Action blocks can freely refer to symbols in the current package. Thus, you can accomplish this by referring to global variables or wrapping things in a let form.

erjoalgo commented 8 years ago

I can wrap things in a let form, but I'm trying to keep track of state across action blocks (eg comment-start, open-string). If I refer to a global variable, it would be shared by all lexer closures. (Eg another lexer would be seeing the previous lexer's states). Is there a cleaner way to do this? You use with-gensyms, perhaps you could provide a kw argument which allows a list of syms to be passed, which is appended to the (with-gensyms (scanner string match-start match-end register-starts register-ends) sym list?

erjoalgo commented 8 years ago

The issue is presented more in detail here in 2006: http://compgroups.net/comp.lang.lisp/state-variables-in-cl-lexer/702118

Without state variables I see only two solutions: first one, to declare a word-by-word recognition of the comment and process it on a higher level, and second, to try to dispatch the same source between several lexers.

There is almost the same situation with the strings, where the recognition of an escaped terminator (like "\"" or "''") is necessary.

djr7C4 commented 8 years ago

It is true there is currently no easy way to create variables that are specific to each instance of the closure instantiated by each define-string-lexer function. However, this isn't something I'm going to add at this time. If you want to implement it and send me a pull request, I'll consider merging it. The way I would do this would be to add an extra keyword argument to the defun for each state variable.

A define-string-lexer definition would then look like `(define-string-lexer (&key )

)`
erjoalgo commented 8 years ago

I tried manually adding to the defun a state variable called "ints-seen":

(defun ,name (,string &key (start 0) (end (length ,string)) (ints-seen 0)) (declare (ignorable start end ints-seen)) ...

Then in an action block:

("int" (return (cons :INT (setq ints-seen (+ 1 ints-seen)))))

But I still get:

The variable my-user-package::INTS-SEEN is unbound.

Any idea of why it is looking for the variable in the user package?

erjoalgo commented 8 years ago

Actually, if I refer to start, which is defined in the exact same way and is supposed to be available in the progns, I get the same error:

The variable my-package::START is unbound.

If I try to pass in a :start kw arg,

(my-lexer source-code :start 1)

I get:

unknown &KEY argument: :START

What am I doing so terribly wrong?

erjoalgo commented 8 years ago

After looking at the expanded macro, I notice that with-gensym has not replaced the implicit progns with the gensyms:

MACRO: (LET* ((*ALLOW-NAMED-REGISTERS* T)
                           (G858
                            (CREATE-SCANNER
                             ({)|(})|([(])|([)])|(;)|(=)|([+]=)|(-=)|([*]=)|(/=)|(%=)|([+])|(-)|([*])|(/)|(%)|(--)|(main)|(return)|(int)|(struct)|(typedef)|(if)|(else)|(while)|(for)|(continue)|(break)|(assert)|(true)|(false)|(NULL)|(alloc)|(alloc_array)|(bool)|(void)|(char)|(string)|(0[xX][0-9a-fA-F]+)|(0|[1-9][0-9]*)|([A-Za-z_][A-Za-z0-9_]*))))
                      (DEFUN MY-LEXER
                             (G859 &KEY (G860 0) (G861 (LENGTH G859)))
                        (DECLARE (IGNORABLE G860))
                        (IF (NULL G861)
                            (SETF G861 (LENGTH G859)))
                        (LAMBDA ()
                          (LOOP
                           (MULTIPLE-VALUE-BIND (G862 G863 G864 G865)
                               (SCAN G858 G859 START G860 END G861)
                             (DECLARE (IGNORABLE G865))
                             (IF G862
                                 (PROGN
                                  (IF (EQL G862 G863)
                                      (ERROR
                                       matched the empty string at position ~d, this will cause an infinite loop
                                       G862))
                                  (SETF G860 G863)
                                  (ECASE (POSITION-IF #'IDENTITY G864)
                                    (0
                                     (LET (($@ (SUBSEQ G859 G862 G863)))
                                       (DECLARE (IGNORABLE $@))
                                       (PROGN (RETURN LBRACE))))
                                    (1
                                     (LET (($@ (SUBSEQ G859 G862 G863)))
                                       (DECLARE (IGNORABLE $@))
                                       (PROGN (RETURN RBRACE))))
                                    (2
                                     (LET (($@ (SUBSEQ G859 G862 G863)))
                                       (DECLARE (IGNORABLE $@))
                                       (PROGN (RETURN LPAREN))))
                       (...)
                       (19
                                     (LET (($@ (SUBSEQ G859 G862 G863)))
                                       (DECLARE (IGNORABLE $@))
                                       (PROGN (RETURN (CONS 'START START)))));;start should be a gensym
                                    ))))
djr7C4 commented 8 years ago

No, start is supposed to be captured, so it should not be a gensym. The issues you describe are because your user package is not set up properly. I'm closing this now as it has gotten way off-topic.

If you or anyone else wants to send a pull request, I will be happy to review it.