Shopify / torch-grammar

65 stars 3 forks source link

different parsers #2

Open gminorcoles opened 1 year ago

gminorcoles commented 1 year ago

this is cool, thanks for starting down this path with PyTorch. I came here from the HN discussion. I have been focusing on llama 2 since it seems for me to be a bit of a tipping point and its time to dig into local LLMs.

So lets say I want to limit the output of llama to valid Python code? The python grammar seems no longer to be EBNF but rather PEG. I was thinking that for my purposes I would actually be very happy if the LLM output was constrained to valid python AST. Either of these requires a different parser and thus code to handle each case.

Have you thought about how to handle this potential proliferation of parser formats? I am probably going to try to copy your approach and extend this to handle python AST.

burke commented 1 year ago

Hey, sorry, I'm chronically bad at GitHub notifications.

I think that's a really cool idea. My knowledge on the theoretical CS part of parsing theory etc. is fuzzy at best so I can't think through what an implementation would look like here for PEG as opposed to EBNF/CFG, but I would be thrilled to see that work.

The actual EBNF parsing code here was ported directly from that equivalent feature in llama.cpp with essentially no changes. I think it would be really excellent if we could support a richer format or a handful of them (this dialect of EBNF is quite rudimentary). I don't currently have any plans to work on this myself.