djspiewak / parseback

A Scala implementation of parsing with derivatives
http://parseback.io
Apache License 2.0
197 stars 22 forks source link

lexing before parsing #42

Open osiire opened 5 years ago

osiire commented 5 years ago

I'm using parseback in my project, but parsing is very slow due to many whitespaces. In practice, there is no DSL does not have whitespace. I have introduced a lexing phase before the PWD, the parsing speed became very fast. Here are 3 patterns of benchmark results in my box.

no optimize

+"parseback.benchmarks.ArithmeticBenchmarks.gllRun","thrpt",1,20,173775.851188,2312.746723,"ops/s",2
+"parseback.benchmarks.ArithmeticBenchmarks.gllRun","thrpt",1,20,18241.987633,984.304636,"ops/s",4
+"parseback.benchmarks.ArithmeticBenchmarks.gllRun","thrpt",1,20,5678.154039,71.475780,"ops/s",8
+"parseback.benchmarks.ArithmeticBenchmarks.gllRun","thrpt",1,20,2642.946283,89.420557,"ops/s",16
+"parseback.benchmarks.ArithmeticBenchmarks.gllRun","thrpt",1,20,1169.474335,37.347710,"ops/s",32
+"parseback.benchmarks.ArithmeticBenchmarks.gllRun","thrpt",1,20,371.850145,6.763177,"ops/s",64
+"parseback.benchmarks.ArithmeticBenchmarks.gllRun","thrpt",1,20,141.861952,4.194896,"ops/s",128
+"parseback.benchmarks.ArithmeticBenchmarks.parsebackRun","thrpt",1,20,22432.031980,982.794937,"ops/s",2
+"parseback.benchmarks.ArithmeticBenchmarks.parsebackRun","thrpt",1,20,3065.175752,40.800879,"ops/s",4
+"parseback.benchmarks.ArithmeticBenchmarks.parsebackRun","thrpt",1,20,628.646364,7.093002,"ops/s",8
+"parseback.benchmarks.ArithmeticBenchmarks.parsebackRun","thrpt",1,20,127.711043,3.572157,"ops/s",16
+"parseback.benchmarks.ArithmeticBenchmarks.parsebackRun","thrpt",1,20,30.057644,0.463080,"ops/s",32
+"parseback.benchmarks.ArithmeticBenchmarks.parsebackRun","thrpt",1,20,5.685695,0.246421,"ops/s",64
+"parseback.benchmarks.ArithmeticBenchmarks.parsebackRun","thrpt",1,20,1.579336,0.116543,"ops/s",128
+"parseback.benchmarks.ArithmeticBenchmarks.spcRun","thrpt",1,20,150796.908074,1527.348563,"ops/s",2
+"parseback.benchmarks.ArithmeticBenchmarks.spcRun","thrpt",1,20,60145.657071,2153.397558,"ops/s",4
+"parseback.benchmarks.ArithmeticBenchmarks.spcRun","thrpt",1,20,25767.637041,1330.493737,"ops/s",8
+"parseback.benchmarks.ArithmeticBenchmarks.spcRun","thrpt",1,20,13851.684348,216.750698,"ops/s",16
+"parseback.benchmarks.ArithmeticBenchmarks.spcRun","thrpt",1,20,6592.322566,303.507154,"ops/s",32
+"parseback.benchmarks.ArithmeticBenchmarks.spcRun","thrpt",1,20,2455.269600,38.373447,"ops/s",64
+"parseback.benchmarks.ArithmeticBenchmarks.spcRun","thrpt",1,20,1492.020733,21.215706,"ops/s",128

no optimize(add whitespace)

+"parseback.benchmarks.ArithmeticBenchmarks.gllRun","thrpt",1,20,132619.563099,2747.014005,"ops/s",2
+"parseback.benchmarks.ArithmeticBenchmarks.gllRun","thrpt",1,20,16931.974282,640.734251,"ops/s",4
+"parseback.benchmarks.ArithmeticBenchmarks.gllRun","thrpt",1,20,5802.609738,154.647659,"ops/s",8
+"parseback.benchmarks.ArithmeticBenchmarks.gllRun","thrpt",1,20,2323.263571,49.327300,"ops/s",16
+"parseback.benchmarks.ArithmeticBenchmarks.gllRun","thrpt",1,20,890.420544,48.637698,"ops/s",32
+"parseback.benchmarks.ArithmeticBenchmarks.gllRun","thrpt",1,20,369.755234,27.868797,"ops/s",64
+"parseback.benchmarks.ArithmeticBenchmarks.gllRun","thrpt",1,20,92.007267,5.009416,"ops/s",128
+"parseback.benchmarks.ArithmeticBenchmarks.parsebackRun","thrpt",1,20,6264.225232,227.800277,"ops/s",2
+"parseback.benchmarks.ArithmeticBenchmarks.parsebackRun","thrpt",1,20,905.146495,31.420769,"ops/s",4
+"parseback.benchmarks.ArithmeticBenchmarks.parsebackRun","thrpt",1,20,183.503893,3.275698,"ops/s",8
+"parseback.benchmarks.ArithmeticBenchmarks.parsebackRun","thrpt",1,20,36.405825,0.545524,"ops/s",16
+"parseback.benchmarks.ArithmeticBenchmarks.parsebackRun","thrpt",1,20,8.060280,0.205061,"ops/s",32
+"parseback.benchmarks.ArithmeticBenchmarks.parsebackRun","thrpt",1,20,1.801275,0.128898,"ops/s",64
+"parseback.benchmarks.ArithmeticBenchmarks.parsebackRun","thrpt",1,20,0.356215,0.046763,"ops/s",128
+"parseback.benchmarks.ArithmeticBenchmarks.spcRun","thrpt",1,20,119033.918128,7822.590611,"ops/s",2
+"parseback.benchmarks.ArithmeticBenchmarks.spcRun","thrpt",1,20,52391.733969,1120.222661,"ops/s",4
+"parseback.benchmarks.ArithmeticBenchmarks.spcRun","thrpt",1,20,23464.677710,638.255485,"ops/s",8
+"parseback.benchmarks.ArithmeticBenchmarks.spcRun","thrpt",1,20,10799.153141,652.494521,"ops/s",16
+"parseback.benchmarks.ArithmeticBenchmarks.spcRun","thrpt",1,20,4477.935194,276.564212,"ops/s",32
+"parseback.benchmarks.ArithmeticBenchmarks.spcRun","thrpt",1,20,2788.747861,57.186063,"ops/s",64
+"parseback.benchmarks.ArithmeticBenchmarks.spcRun","thrpt",1,20,1382.957582,31.359118,"ops/s",128

lexing optimize(add whitespace)

+"parseback.benchmarks.ArithmeticBenchmarks.gllRun","thrpt",1,20,169928.693894,4828.383320,"ops/s",2
+"parseback.benchmarks.ArithmeticBenchmarks.gllRun","thrpt",1,20,17080.524603,1153.686539,"ops/s",4
+"parseback.benchmarks.ArithmeticBenchmarks.gllRun","thrpt",1,20,5467.070697,490.304521,"ops/s",8
+"parseback.benchmarks.ArithmeticBenchmarks.gllRun","thrpt",1,20,2167.957922,174.578779,"ops/s",16
+"parseback.benchmarks.ArithmeticBenchmarks.gllRun","thrpt",1,20,904.012384,24.496258,"ops/s",32
+"parseback.benchmarks.ArithmeticBenchmarks.gllRun","thrpt",1,20,394.103533,6.246434,"ops/s",64
+"parseback.benchmarks.ArithmeticBenchmarks.gllRun","thrpt",1,20,110.933693,4.439783,"ops/s",128
+"parseback.benchmarks.ArithmeticBenchmarks.parsebackRun","thrpt",1,20,22831.346090,309.065844,"ops/s",2
+"parseback.benchmarks.ArithmeticBenchmarks.parsebackRun","thrpt",1,20,2764.269407,130.351963,"ops/s",4
+"parseback.benchmarks.ArithmeticBenchmarks.parsebackRun","thrpt",1,20,531.611580,49.118742,"ops/s",8
+"parseback.benchmarks.ArithmeticBenchmarks.parsebackRun","thrpt",1,20,123.734994,10.489502,"ops/s",16
+"parseback.benchmarks.ArithmeticBenchmarks.parsebackRun","thrpt",1,20,34.538655,2.236450,"ops/s",32
+"parseback.benchmarks.ArithmeticBenchmarks.parsebackRun","thrpt",1,20,7.693930,0.275773,"ops/s",64
+"parseback.benchmarks.ArithmeticBenchmarks.parsebackRun","thrpt",1,20,1.557181,0.021734,"ops/s",128
+"parseback.benchmarks.ArithmeticBenchmarks.spcRun","thrpt",1,20,123540.595553,1857.826577,"ops/s",2
+"parseback.benchmarks.ArithmeticBenchmarks.spcRun","thrpt",1,20,54996.680269,1056.356606,"ops/s",4
+"parseback.benchmarks.ArithmeticBenchmarks.spcRun","thrpt",1,20,20867.465192,2223.796593,"ops/s",8
+"parseback.benchmarks.ArithmeticBenchmarks.spcRun","thrpt",1,20,10111.816131,1135.447739,"ops/s",16
+"parseback.benchmarks.ArithmeticBenchmarks.spcRun","thrpt",1,20,5587.119579,253.280717,"ops/s",32
+"parseback.benchmarks.ArithmeticBenchmarks.spcRun","thrpt",1,20,2658.565723,56.281985,"ops/s",64
+"parseback.benchmarks.ArithmeticBenchmarks.spcRun","thrpt",1,20,1477.770367,132.052776,"ops/s",128

Lexing before PWD is as fast as non-whitespace input even if the input has whitespaces. I suggest you introduce a lexing phase before parse.