arademaker / hs-conllu

CoNLL-U/UD library
GNU Lesser General Public License v3.0
3 stars 3 forks source link

Plans for the library #36

Open odanoburu opened 2 years ago

odanoburu commented 2 years ago

I wrote the text below as an open reply to @arademaker for our conversation on #32 about plans for the library.


I'd like to change the structure of the library a bit: first have a really dumb parser that would accept anything remotely matching the conllu format, then do light validation on top of it according to user specification. This would mean not to hardcode deprels and other stuff, but read a file that lists the acceptable entities (these files already exist for the canonical validating script, but the user could tweak them if they wanted to).

I also think that the megaparsec library might be unnecessary since the conllu format is so simple, but its performance is not bad and the error-reporting facilities are great (are we using them as well as we could?), so maybe I'd leave that be. If there's a performance need, then we might think about it.

I don't think it's worth it to implement full conllu validation, for the reasons I said on #34.

At some point I had plans for a query interface like the one in http://match.grew.fr (see https://github.com/odanoburu/hs-conllu/compare/master...query), but honestly I don't think it's worth implementing it since just loading the data on a graph database would give better-performing queries and facilities for visualization for free :)

Finally, I started writing this library a long time ago when I first started learning Haskell, so I would also change the code quite a bit to reflect some of what I learned since then.

arademaker commented 2 years ago

Thank you, these ideas can be a good starting point for further work.