gnames / gnparser

GNparser normalises scientific names and extracts their semantic elements.
MIT License
38 stars 4 forks source link

Use PEG to parse all hybrid characters and include them in the output #176

Closed tobymarsden closed 3 years ago

tobymarsden commented 3 years ago

This is another approach to #174 which retains the verbatim hybrid characters from the input by parsing them entirely within PEG instead of preprocessing them.

Tests are passing but I'm not 100% certain I haven't missed something, or if this is the best way to go...

dimus commented 3 years ago

I think this approach is a "less surprises" one, let me think, and run profiling

dimus commented 3 years ago

It is slighly faster and eats slighlty more memory:

Regex approach (master)

name                                     time/op
Parse/Parse_to_object_once-16            73.0µs ± 1%
Parse/Parse_to_object_once_with_Init-16  83.2µs ± 1%
Parse/Parse_to_object-16                 67.5ms ± 1%
Parse/Parse_to_JSON-16                   71.5ms ± 1%
Parse/Parse_to_JSON_(Details)-16         71.8ms ± 1%
Parse/Parse_to_CSV-16                    69.1ms ± 1%

name                                     alloc/op
Parse/Parse_to_object_once-16            10.9kB ± 0%
Parse/Parse_to_object_once_with_Init-16  23.8kB ± 0%
Parse/Parse_to_object-16                 15.5MB ± 0%
Parse/Parse_to_JSON-16                   17.2MB ± 0%
Parse/Parse_to_JSON_(Details)-16         17.2MB ± 0%
Parse/Parse_to_CSV-16                    16.2MB ± 0%

name                                     allocs/op
Parse/Parse_to_object_once-16               250 ± 0%
Parse/Parse_to_object_once_with_Init-16     409 ± 0%
Parse/Parse_to_object-16                   235k ± 0%
Parse/Parse_to_JSON-16                     242k ± 0%
Parse/Parse_to_JSON_(Details)-16           242k ± 0%
Parse/Parse_to_CSV-16                      240k ± 0%

PEG approach (this branch)

name                                     time/op
Parse/Parse_to_object_once-16            71.7µs ± 1%
Parse/Parse_to_object_once_with_Init-16  81.4µs ± 1%
Parse/Parse_to_object-16                 66.0ms ± 1%
Parse/Parse_to_JSON-16                   69.7ms ± 1%
Parse/Parse_to_JSON_(Details)-16         69.9ms ± 1%
Parse/Parse_to_CSV-16                    67.5ms ± 1%

name                                     alloc/op
Parse/Parse_to_object_once-16            11.4kB ± 0%
Parse/Parse_to_object_once_with_Init-16  24.2kB ± 0%
Parse/Parse_to_object-16                 16.2MB ± 0%
Parse/Parse_to_JSON-16                   17.9MB ± 0%
Parse/Parse_to_JSON_(Details)-16         17.9MB ± 0%
Parse/Parse_to_CSV-16                    16.8MB ± 0%

name                                     allocs/op
Parse/Parse_to_object_once-16               249 ± 0%
Parse/Parse_to_object_once_with_Init-16     408 ± 0%
Parse/Parse_to_object-16                   233k ± 0%
Parse/Parse_to_JSON-16                     241k ± 0%
Parse/Parse_to_JSON_(Details)-16           241k ± 0%
Parse/Parse_to_CSV-16                      239k ± 0%

Adding cultivars did not affect speed that much:

Before cultivars (v1.2.0)

name                                     time/op
Parse/Parse_to_object_once-16            69.4µs ± 2%
Parse/Parse_to_object_once_with_Init-16  79.0µs ± 1%
Parse/Parse_to_object-16                 64.3ms ± 0%
Parse/Parse_to_JSON-16                   68.5ms ± 1%
Parse/Parse_to_JSON_(Details)-16         68.4ms ± 1%
Parse/Parse_to_CSV-16                    65.9ms ± 1%

name                                     alloc/op
Parse/Parse_to_object_once-16            11.4kB ± 0%
Parse/Parse_to_object_once_with_Init-16  23.8kB ± 0%
Parse/Parse_to_object-16                 16.2MB ± 0%
Parse/Parse_to_JSON-16                   17.9MB ± 0%
Parse/Parse_to_JSON_(Details)-16         17.9MB ± 0%
Parse/Parse_to_CSV-16                    16.9MB ± 0%

name                                     allocs/op
Parse/Parse_to_object_once-16               251 ± 0%
Parse/Parse_to_object_once_with_Init-16     403 ± 0%
Parse/Parse_to_object-16                   235k ± 0%
Parse/Parse_to_JSON-16                     242k ± 0%
Parse/Parse_to_JSON_(Details)-16           242k ± 0%
Parse/Parse_to_CSV-16                      240k ± 0%
dimus commented 3 years ago

I would go for it, if you update it to the current master

tobymarsden commented 3 years ago

@dimus welp, sorry I couldn't get to this. Got majorly sidetracked...

dimus commented 3 years ago

No problem, it could wait, and I was in a cleaning up mood :)