Open erikrose opened 10 years ago
Yep, I want to have Ford's, or at least a superset of it.
:+1:
Is there a workaround for parsing newlines that is better than just escaping the newline character?
There might be some escaping dance you can do to get it into a Literal, or you can do what I do in grammar.py and stick it in a regex:
comment = ~r"#[^\r\n]*"
What is the current recommended way to match \n
?
After much fooling around I was able to C-style multiline comments working with the following
comment = ws* ~r"/\*.*?\*/"s ws*
ws = ~r"\s*"i
Is there an easier way?
That looks correct and concise. You could probably make it faster by using inverted character classes. In general, non-greedy quantifiers like *?
are slow because they create a lot of backtracking. Instead you could try something like this (which matches double-quoted strings with backslash escapes) for speed:
~"u?r?\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\""is
Sorry about all the backslashes. Anyway, notice how I scan quickly ahead for anything that couldn't possibly be an ending quote or a backslash, using [^\"\\\\]*
, then go looking for actual special things with the (?:\\\\.[^\"\\\\]*)*
. Of course, it's not nearly as readable as your spelling.
Thanks, that's definitely worth knowing. I did some benchmarking to see how much comments are costing in processing time.
I started with an 85 measure bass part I'd recently transcribed that had multiple comments amounting to 38% of the total characters in the file. I made it into two larger benchmark files -- one with and one without comments -- by replicating the original 20 times. So that's 1700 measures of music -- more or less equivalent to a score in all parts for a small orchestral movement.
$ wc benchmark.tbn nocommentbenchmark.tbn
1342 13132 49229 benchmark.tbn
880 8760 30400 nocommentbenchmark.tbn
The processing time, including midi file creation, on my 2012 Mac Mini was ~6.5 seconds in either case. That's about 4 ms per measure. The processing overhead for the comments was just over 2%. I think I can live with that :-)
$ time tbon -q nocommentbenchmark.tbn
Processing nocommentbenchmark.tbn
Created nocommentbenchmark.mid
real 0m6.572s
user 0m6.405s
sys 0m0.163s
$ time tbon -q benchmark.tbn
Processing benchmark.tbn
Created benchmark.mid
real 0m6.717s
user 0m6.547s
sys 0m0.166s
Great! Benchmarking is always the best answer. :-)
It's awkward to express LFs, CRs, etc. in grammars, because Python tends to replace them with actual newlines, which are no-ops. It works in the grammar DSL's grammar because they're wrapped in regexes, but that shouldn't be required. Ford's original PEG grammar supports \n\r\t\'\"{}\ and some numerics. We should probably go that way.