Closed mewmew closed 7 years ago
A clean definition could look something along the lines of:
DILocation
: "!DILocation" "(" OptLine "," OptColumn "," Scope "," OptInlinedAt ")"
;
OptLine
: empty
| "line:" int_lit
;
OptColumn
: empty
| "column:" int_lit
;
Scope
: "scope:" MetadataID
;
OptInlinedAt
: empty
| "inlinedAt" MetadataID
;
However, I can't figure out how to handle the separating commas. What if two optional fields in a row are not present?
Actually, thinking more about this specific example it seems to work out. Not pretty, but works. The reasoning is to let optional fields specify their separator in the direction of their closest required field.
DILocation
: "!DILocation" "(" OptLine OptColumn Scope OptInlinedAt ")"
;
OptLine
: empty
| "line:" int_lit ","
;
OptColumn
: empty
| "column:" int_lit ","
;
Scope
: "scope:" MetadataID
;
OptInlinedAt
: empty
| "," "inlinedAt" MetadataID
;
Now for the more tricky question, how do you deal with debug classes that contain only optional fields? For a specific example, consider DISubprogram
:
// optional: scope
// optional: name
// optional: linkageName
// optional: file
// optional: line
// optional: type
// optional: isLocal
// optional: isDefinition
// optional: scopeLine
// optional: containingType
// optional: virtuality
// optional: virtualIndex
// optional: thisAdjustment
// optional: flags
// optional: isOptimized
// optional: unit
// optional: templateParams
// optional: declaration
// optional: variables
// optional: thrownTypes
The only solution I can think of requires n+1 production rules. One rule per optional key, and one for empty
when none of the optional keys are set. The rationale is roughly to regard a single optional key as a required field, and let all the other optional fields include the comma separation (as done in the example above). The first two rules would thus look at follows, and there would exist 19 more such rules, one where name
was a fake required field, one for linkageName
, and so on. Obviously this is not the prettiest of solutions, and I'd be glad to hear if there is some clean solution I have not yet considered. Any ideas?
DISubprogram
: "!DISubprogram" "(" ")"
| "!DISubprogram" "(" Scope OptName OptLinkageName OptFile OptLine OptType OptIsLocal OptIsDefinition OptScopeLine OptContainingType OptVirtuality OptVirtualIndex OptThisAdjustment OptFlags OptIsOptimized OptUnit OptTemplateParams OptDeclaration OptVariables OptThrownTypes ")"
| "!DISubprogram" "(" OptScope Name OptLinkageName OptFile OptLine OptType OptIsLocal OptIsDefinition OptScopeLine OptContainingType OptVirtuality OptVirtualIndex OptThisAdjustment OptFlags OptIsOptimized OptUnit OptTemplateParams OptDeclaration OptVariables OptThrownTypes ")"
| ...
| "!DISubprogram" "(" OptScope OptName OptLinkageName OptFile OptLine OptType OptIsLocal OptIsDefinition OptScopeLine OptContainingType OptVirtuality OptVirtualIndex OptThisAdjustment OptFlags OptIsOptimized OptUnit OptTemplateParams OptDeclaration OptVariables ThrownTypes ")"
;
Scope
: "scope:" Value
;
OptName
: empty
| "," "name:" Value
;
OptLinkageName
: empty
| "," "linkageName:" Value
;
OptFile
: empty
| "," "file:" Value
;
OptLine
: empty
| "," "line:" Value
;
OptType
: empty
| "," "type:" Value
;
OptIsLocal
: empty
| "," "isLocal:" Value
;
OptIsDefinition
: empty
| "," "isDefinition:" Value
;
OptScopeLine
: empty
| "," "scopeLine:" Value
;
OptContainingType
: empty
| "," "containingType:" Value
;
OptVirtuality
: empty
| "," "virtuality:" Value
;
OptVirtualIndex
: empty
| "," "virtualIndex:" Value
;
OptThisAdjustment
: empty
| "," "thisAdjustment:" Value
;
OptFlags
: empty
| "," "flags:" Value
;
OptIsOptimized
: empty
| "," "isOptimized:" Value
;
OptUnit
: empty
| "," "unit:" Value
;
OptTemplateParams
: empty
| "," "templateParams:" Value
;
OptDeclaration
: empty
| "," "declaration:" Value
;
OptVariables
: empty
| "," "variables:" Value
;
OptThrownTypes
: empty
| "," "thrownTypes:" Value
;
Now, for the problem I've yet to find a good, and general solution to. Any help would be much appreciated. How do you handle cases when a debug class contains two or more required fields with optional fields in between? As an example, consider:
// optional: a
// optional: b
// REQUIRED: c
// optional: d
// optional: e
// REQUIRED: f
// optional: g
// optional: h
Specifically, how do you handle the separating commas in between c
and f
?
Oh, no. Even before running this I can already envision the LR(1) collisions for any debug classes where a required field is followed by two or more optional fields; because then the look-ahead will be ',' for both of the production rules, and reducing to empty
or shifting to ,
will produce a conflict between the two optional production rules.
Oh, why :(
Not that it is a pretty solution, but would it be difficult to specify a look-ahead of 2 for Gocc? I can imagine that the tables would be tremendously huge, thus making it unfeasible.
Hmm, it is a bit too tempting to allow syntactically invalid input to get a clean grammar, e.g. to allow trailing commas in between the last field and the left parenthesis.
( foo , bar , baz , )
Co-incidentally, this is how to Go grammar is defined https://golang.org/ref/spec#Parameters
Parameters = "(" [ ParameterList [ "," ] ] ")" .
Oh, how I which the LLVM IR grammar also allowed this pragmatic syntax.
Ok, it is obviously getting late. Allowing trailing commas would not solve the issue of parsing, only make it easier to write programs. Hmm, I need some rest. Sorry for the noise, and please let me know if anyone of you have some ideas on these problems :)
Good night.
Alas, 16 LR(1) conflicts for the following grammar:
// ref: ParseDIGlobalVariable
//
// REQ: name string (cannot be empty)
// opt: scope MDField
// opt: linkageName string
// opt: file MDField
// opt: line uint32
// opt: type MDField
// opt: isLocal bool
// opt: isDefinition bool (default true)
// opt: declaration MDField
// opt: align uint32
DIGlobalVariable
: "!DIGlobalVariable" "(" Name OptCommaScope OptCommaLinkageName OptCommaFile OptCommaLine OptCommaType OptCommaIsLocal OptCommaIsDefinition OptCommaDeclaration OptCommaAlign ")"
;
Name
: "name:" string_lit
;
OptCommaScope
: empty
| "," Scope
;
Scope
: "scope:" MetadataID
;
OptCommaLinkageName
: empty
| "," LinkageName
;
LinkageName
: "linkageName:" string_lit
;
OptCommaFile
: empty
| "," File
;
File
: "file:" MetadataID
;
OptCommaLine
: empty
| "," Line
;
Line
: "line:" int_lit
;
OptCommaType
: empty
| "," Type
;
Type
: "type:" MetadataID
;
OptCommaIsLocal
: empty
| "," IsLocal
;
IsLocal
: "isLocal:" BoolLit
;
OptCommaIsDefinition
: empty
| "," IsDefinition
;
IsDefinition
: "isDefinition:" BoolLit
;
OptCommaDeclaration
: empty
| "," Declaration
;
Declaration
: "declaration:" MetadataID
;
OptCommaAlign
: empty
| "," Align
;
Align
: "align:" int_lit
;
What may be a better approach?
I don't know if this is useful, but I made a space preserving parser with optional parenthesis and ran into some of these problems. I ended up using a "Continue" pattern, trademark pending ;)
https://github.com/katydid/katydid/blob/master/relapse/bnf/all.bnf#L276-L297
StartOr
: OpenParen Pattern Pipe ContinueOr CloseParen <<
&Pattern{Or: &Or{
OpenParen: $0.(*Keyword),
LeftPattern: $1.(*Pattern),
Pipe: $2.(*Keyword),
RightPattern: $3.(*Pattern),
CloseParen: $4.(*Keyword),
}}, nil
>>
;
ContinueOr
: Pattern
| ContinueOr Pipe Pattern <<
&Pattern{Or: &Or{
LeftPattern: $0.(*Pattern),
Pipe: $1.(*Keyword),
RightPattern: $2.(*Pattern),
}}, nil
>>
;
I think you can use the same idea combined with your idea.
// optional: a
// optional: b
// REQUIRED: c
// optional: d
List: "A" "," ContinueB
| ContinueB
;
ContinueB: "B" "," ContinueC
| ContinueC
;
ContinueC: "C" "," "D"
| "C"
;
I haven't checked for LR(1) collisions, but I was able to use the "Continue" Pattern in relapse without using -a
(maximal much).
Thanks @awalterschulze! I think the Continue™️ pattern of yours would be sufficient for solving this problem. A minor update made it possible to parse the end as well,
// optional: a
// optional: b
// REQUIRED: c
// optional: d
// optional: e
// REQUIRED: f
// optional: g
// optional: h
List
: "A" "," ContinueB
| ContinueB
;
ContinueB
: "B" "," ContinueC
| ContinueC
;
ContinueC
: "C" "," ContinueD
;
ContinueD
: "D" "," ContinueE
| ContinueE
;
ContinueE
: "E" "," ContinueF
| ContinueF
;
ContinueF
: "F" "," ContinueG
| "F"
;
ContinueG
: "G" "," "H"
| "G"
| "H"
;
It was an interesting problem to think about. Thanks for your input too :) In the end, for my specific use case, it turned out to be easier to solve a generalized version of the problem, rather than including dedicated grammar for each DWARF debug class, add a grammar capable of capturing their common features. As a side-effect this approach makes the parser more extendible to future additions/alterations of DWARF metadata classes.
The generalized version looks roughly as follows. Optional fields would then be checked as a production action.
SpecializedMDNode
: MetadataName "(" MDFields ")"
;
MDFields
: empty
| MDFieldList
| MDValueList
;
MDFieldList
: MDField
| MDFieldList "," MDField
;
MDField
: LabelIdent MDValue
;
MDValueList
: MDValue
| MDValueList "," MDValue
;
MDValue
: int_lit
| MetadataID
| string_lit
| BoolLit
| MDFlagList
| NullConst
| Type Value
;
MDFlagList
: MDFlag
| MDFlagList "|" MDFlag
;
MDFlag
: ident
;
Closing this issue. Really glad there are rather clean solutions to what seemed like a tricky problem.
Ah yes I much prefer your generalized solution.
My pleasure.
On Fri, 20 Oct 2017, 23:03 Robin Eklind, notifications@github.com wrote:
Thanks @awalterschulze https://github.com/awalterschulze! I think the Continue™️ pattern of yours would be sufficient for solving this problem. A minor update made it possible to parse parse the end as well,
// optional: a// optional: b// REQUIRED: c// optional: d// optional: e// REQUIRED: f// optional: g// optional: h
List : "A" "," ContinueB | ContinueB ;
ContinueB : "B" "," ContinueC | ContinueC ;
ContinueC : "C" "," ContinueD ;
ContinueD : "D" "," ContinueE | ContinueE ;
ContinueE : "E" "," ContinueF | ContinueF ;
ContinueF : "F" "," ContinueG | "F" ;
ContinueG : "G" "," "H" | "G" | "H" ;
It was an interesting problem to think about. Thanks for your input too :) In the end, for my specific use case, it turned out to be easier to solve a generalized version of the problem, rather than including dedicated grammar for each DWARF debug class, add a grammar capable of capturing their common features. As a side-effect this approach makes the parser more extendible to future additions/alterations of DWARF metadata classes.
The generalized version looks roughly as follows. Optional fields would then be checked as a production action.
SpecializedMDNode : MetadataName "(" MDFields ")" ;
MDFields : empty | MDFieldList | MDValueList ;
MDFieldList : MDField | MDFieldList "," MDField ;
MDField : LabelIdent MDValue ;
MDValueList : MDValue | MDValueList "," MDValue ;
MDValue : int_lit | MetadataID | string_lit | BoolLit | MDFlagList | NullConst | Type Value ;
MDFlagList : MDFlag | MDFlagList "|" MDFlag ;
MDFlag : ident ;
Closing this issue. Really glad there are rather clean solutions to what seemed like a tricky problem.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/goccmack/gocc/issues/58#issuecomment-338321722, or mute the thread https://github.com/notifications/unsubscribe-auth/ABvsLekg9opPjJdhXceZXdGBHOxuoC2aks5suQqcgaJpZM4P_9lb .
I've recently been playing with writing a grammar for the metadata representation of the DWARF debug information of LLVM IR.
The debug data is specified as an ordered list of key-value pairs, where most keys are optional and some are required. Take
DILocation
for instance, which specifies the debug data of a source code location (i.e. line, column).As you can see, the above works but since we have 3 optional fields, we need 2^3=8 production rules for
DILocation
. This obviously doesn't scale for other debug classes, some of which contain 9 or more optional fields, thus >= 512 production rules.Any thoughts or ideas on how one may solve this in a cleaner fashion?
Constraints: ordered list, no duplicates, and required fields must be present.
Cheers /u