gnames / gnparser

GNparser normalises scientific names and extracts their semantic elements.
MIT License
39 stars 4 forks source link

As a User I want to have both parse ary and parse stream options #122

Closed dimus closed 3 years ago

dimus commented 3 years ago

Currently, gnpaser moved to use parsing by chunks instead of parsing a stream. This implementation allows to have the same order of input and output, but it is not good to using gnpaser as a "pipe in pipe out", because it collects names into a "buffer" until they are parsed, while pipe in pipe out approach sends and receives names one by one.

dimus commented 3 years ago

@LocoDelAssembly, for using pipes in Ruby code one in has to immediately return one out. In old code it was achieved by a stream that would take one name at a time instead of an array of names.

It is possible to avoid streaming method by reducing the size of a batch to 1. So I suspect we can close this ticket for now. An option for the batch size is added already.

dimus commented 3 years ago

The v1.0.0 code is located in 102-clean branch, an example of running 1 name at a time is

gnparser -b 1 "Pardosa moesta" 
dimus commented 3 years ago

Looks like we need to have streaming option, because otherwise pipe solution becomes too hackish

dimus commented 3 years ago

It seems that https://github.com/gnames/gnparser/commit/b47531d3795616d43091a2d005ca524119fb895c allows to close the issue again for now

dimus commented 3 years ago

Looks like we do need a stream. I have an expreimental package to order unordered streams, and I will try to use it here. https://github.com/gnames/gnlib/blob/master/organizer/organizer.go