Closed kortschak closed 3 years ago
Hmmm, I wouldn't recommend adding recursion inside of a scanner. It's okay to jump to a recursive thing, then re-enter the scanner from the start, but returning back to a scanner is trouble. Maybe you want scanning pass that is sending tokens to a parser (ragel or otherwise)?
Yes, that makes sense. The model I was thinking of using is essentially a stack of scanner parsers, but it gets complicated fairly quickly and you need to break the ragel abstraction in ways that aren't intended to be done.
Would it make sense to put this simple, line-base parts of the grammar into the initial scanner (all the parts that aren't triples) and tokenise the triples handing the tokens to the second pass? Given that triples are terminated by '.'
and this appears nowhere else except within other tokens (and so is distinguishable) a complete triples statement set can be handed to the second pass. Or will that result in an ambiguity by matching '.'
anywhere?
@kortschak You could do that. I'ts up to you, but personally I would use a single paradigm across the entire grammar though. Are you going for speed?
Thanks. Yes, I'm going for speed; the input streams tend to be long in the domain that I'm targetting. When you say a single paradigm do you mean handling all of the scanner's parts the same way?
Yeah, I would go for a uniform approach to scanning and a separate uniform approach to parsing. Only if it's absolutely necessary would I deviate from that approach. But that's just me! More than one way to solve problems in this space, which is part of why it's fun!
Thanks for your help. I plan to attack this more fully in the coming weeks. I may ask more.
No problem, will close this issue. Fee free to open another. I have my reputation with google restored, but haven't re-enabled the mailing list yet. Need configure the list to reject the messages that were getting bounced and then reported as spam.
I'm working on an implementation of an RDF Turtle parser using ragel targetting Go.
The RDF/Turtle grammar is recursive with subjects and objects being defined in terms of collections of objects. The approach that I have take is to define this section of the grammar like so in ragel (complete grammar below in details):
Here the intention is to have
collection
andblankNodePropertyList
call host language functions or makefcall
s that use ragel machines parsingobjects
andblankNodeProperties
.Originally I was using a standard parser, but found that the code generation did not complete, being killed by the OOM killer. Changing over to a scanner model fixed that (and simplified the host support code). However, this precludes
fcall
as far as I can see; the scanner is defined asMy question is whether it is safe/correct to pass
p
etc to theEnterCollection
andEnterBlankNodePropertyList
-called host language functions to allow them to parse theobjects
andblankNodeProperties
, and then return the ragel state vars for the main machine to update with? If so which state vars are appropriate to change and what should be done withcs
andact
?(In the longer term, the intention is to have all this wrapped up into a struct to construct a streaming RDF/Turtle parser where a call to an
Unmarshal
method returns a single RDF statement).