cadmiumcr / cadmium

Natural Language Processing (NLP) library for Crystal
https://cadmiumcr.com
MIT License
203 stars 15 forks source link

Proposal: Evaluator and/or Benchmark repositories #33

Open rmarronnier opened 4 years ago

rmarronnier commented 4 years ago

Preface

Evaluating the accuracy of the output of an NLP component is a science in itself.

When a new NLP algorithm, method or tool is published, it is always accompanied by benchmarks against existing systems.

Those benchmarks are produced using standard evaluation techniques and dataset.

These evaluation techniques are not always automatic.

A human judgment is sometimes necessary. In this case, there's nothing Cadmium can do to help.

However a set of existing tools exist depending on the NLP task to be tested :

We can add to those tools standard dataset and corpora already gold labeled and human checked.

These are just examples found after a cursory search. The list is bigger and the tools get better fast.

Details

The main idea of this proposal is to :

The point being to give a glimpse of Cadmium possibilities and routinely check our tools accuracy (which crystal spec is not intended to do).

This proposal is mainly a braindump, as I don't intend to start working on this short term (I have to finish my POS Tagger first !)

watzon commented 4 years ago

I love the idea. Maybe we can use GitHub actions to automate the benchmarks and evaluators whenever a repo get's a push to master.

rmarronnier commented 4 years ago

Yeah ! If it can download x00 megs datasets then I can't see why not ! It would be fantastic if Github actions could generate a json file to be used by d3.js on the website or at least produce a nice svg with the help of graphviz... Ok, I'm a dreamer :smile:

watzon commented 4 years ago

There's no reason it couldn't. All you need is a docker container that can do it.

rmarronnier commented 4 years ago

Yeah, you're right. One thing to keep in mind : Each job in a workflow can run for up to 6 hours of execution time. Usage limits

watzon commented 4 years ago

6 hours is insane. I doubt we'll even come close.