Open fuhrmanator opened 8 years ago
As for python solutions with PEG, the documentation for https://github.com/erikrose/parsimonious looks good.
More banging on PEG.js reveals a pretty powerful parser for these cases:
/* test for CAP DSL */
/* currently parsing works, but no results generated */
/* Javascript goes inside the {} */
Expression
= Spacing Rule+ EndOfFile
Rule
/* define specific rules for the activity types */
= head:ExamActivity _ tail:Timing Spacing {
return head + ':' + tail;
}
/ activity:MoodleQuizActivity _ opens:MoodleQuizOpenTime _ closes:MoodleQuizCloseTime Spacing {
return activity + ':\n opens:' + opens + ',\n closes: ' + closes;
}
/ activity:MoodleHomeworkActivity _ opens:MoodleHomeworkAllowSubmissionsTime _ due:MoodleHomeworkDueTime _ cutoff:MoodleHomeworkCutoffTime Spacing {
return activity + ':\n allow submissions after:' + opens + ',\n due date: ' + due + ',\n cutoff date: ' + cutoff;
}
ExamActivity
= head:EXAM_ACTIVITY_CODE tail:Integer {
return head + tail;
}
MoodleQuizActivity
= head:MOODLE_QUIZ_ACTIVITY_CODE tail:Integer {
return head + tail;
}
MoodleQuizOpenTime "Moodle Quiz Open Time" = Timing
MoodleQuizCloseTime "Moodle Quiz Close Time" = Timing
MoodleHomeworkActivity
= head:MOODLE_HOMEWORK_ACTIVITY_CODE tail:Integer {
return head + tail;
}
MoodleHomeworkAllowSubmissionsTime "Moodle Homework Allow Submissions Time" = Timing
MoodleHomeworkDueTime "Moodle Homework Due Time" = Timing
MoodleHomeworkCutoffTime "Moodle Homework Cutoff Time" = Timing
Activity "Activity Number (e.g., E1 for Exam 1)"
= head:ActivityCode tail:Integer {
return head + tail;
}
Timing
/* Case for Session, Labs, Practica */
= head:MeetingSequence tail:TimeModifier? {
var result = head, i;
if (tail !== null) {
for (i=0; i<tail.length; i++) {
result += tail[i];
}
}
return result;
}
ActivityCode
= code:(EXAM_ACTIVITY_CODE / MOODLE_QUIZ_ACTIVITY_CODE / MOODLE_HOMEWORK_ACTIVITY_CODE) { return code; }
MeetingSequence "Meeting Number (e.g., S2 for Seminar 2)"
= meeting:(SEMINAR_MEETING / LABORATORY_MEETING / PRACTICUM_MEETING) number:Integer { return meeting + ' ' + number}
TimeModifier
= time:(MEETING_START / MEETING_END) adjust:((('-'/'+') DeltaTime) ('@' HHMM)?)?
DeltaTime
= Integer ('m' / 'h' / 'd' / 'w')
/* http://stackoverflow.com/a/20123018/1168342 -- except the order is different in PEG (first match) and there's a bug with the ? between [0-1]?[0-9] */
HHMM
= ([2][0-3] / [0-1]?[0-9] / [0-9]) ':' [0-5][0-9] { return text() }
Integer "integer"
= [0-9]+ { return parseInt(text(), 10); }
_ "whitespace"
= [ \t]*
EXAM_ACTIVITY_CODE
= EXAM_ACTIVITY_CODE:'E' { return "Exam "}
MOODLE_QUIZ_ACTIVITY_CODE
= 'Q' { return "Moodle Quiz "}
MOODLE_HOMEWORK_ACTIVITY_CODE
= 'H' { return "Moodle Homework "}
SEMINAR_MEETING
= 'S' {return 'Seminar'; }
LABORATORY_MEETING
= 'L' {return 'Laboratory'; }
PRACTICUM_MEETING
= 'P' {return 'Practicum'; }
MEETING_START
= 'S' {return '(start)'; }
MEETING_END
= 'F' {return '(end)'; }
Spacing
= (Space / Comment)*
Comment
= '#' (!EndOfLine .)* EndOfLine { return 'comment';}
Space
= ' ' / '\t' / EndOfLine
EndOfLine
= '\r\n' / '\n' / '\r'
EndOfFile
= !. { return "EOF"; }
Here's the sample it parses:
# here's a comment
E1 S2
Q1 S1F S2S-30m
H1 L2F L3S-1d@23:55 L3S-1d@23:55
and the resulting output:
[
[
"comment"
],
[
"Exam 1:Seminar 2",
"Moodle Quiz 1:
opens:Seminar 1(end)null,
closes: Seminar 2(start)-,30,m,",
"Moodle Homework 1:
allow submissions after:Laboratory 2(end)null,
due date: Laboratory 3(start)-,1,d,@,23:55,
cutoff date: Laboratory 3(start)-,1,d,@,23:55"
],
"EOF"
]
Also, error messages are somewhat automatic. For example, leaving off the 3rd date for the H1 line (it requires 3 dates), I get the following error message:
Line 4, column 21: Expected Moodle Homework Cutoff Time but end of input found.
Good reference for PEG is the research paper : http://www.brynosaurus.com/pub/lang/peg.pdf
It would be useful to define a simple grammar for the plan language. Documenting the language would be easier and changes can be more easily coded.
Have a look at the PEG.js online tester for an example of a simple grammar parser. There is something for Python, but I'm not sure how easily it can be used. Perhaps start with a PEG.js version and convert? Whatever the way, this kind of parsing is more robust and easier to modify I think.
I hacked up the following on http://pegjs.org/online:
which parses these examples: