Open gminorcoles opened 1 year ago
Hey, sorry, I'm chronically bad at GitHub notifications.
I think that's a really cool idea. My knowledge on the theoretical CS part of parsing theory etc. is fuzzy at best so I can't think through what an implementation would look like here for PEG as opposed to EBNF/CFG, but I would be thrilled to see that work.
The actual EBNF parsing code here was ported directly from that equivalent feature in llama.cpp with essentially no changes. I think it would be really excellent if we could support a richer format or a handful of them (this dialect of EBNF is quite rudimentary). I don't currently have any plans to work on this myself.
this is cool, thanks for starting down this path with PyTorch. I came here from the HN discussion. I have been focusing on llama 2 since it seems for me to be a bit of a tipping point and its time to dig into local LLMs.
So lets say I want to limit the output of llama to valid Python code? The python grammar seems no longer to be EBNF but rather PEG. I was thinking that for my purposes I would actually be very happy if the LLM output was constrained to valid python AST. Either of these requires a different parser and thus code to handle each case.
Have you thought about how to handle this potential proliferation of parser formats? I am probably going to try to copy your approach and extend this to handle python AST.