NASA-IMPACT / evalem

An evaluation framework for your large model pipelines
0 stars 0 forks source link

[feature request] Cache predictions in the evaluation pipeline #14

Open NISH1001 opened 1 year ago

NISH1001 commented 1 year ago

What

Currently, evalem.pipelines.SimpleEvaluationPipeline is stateless. That means any forward passes (including inferencing and evaluation results) aren't cached within the pipeline object. This is fine for inference+evaluation on a small sample size. However, for a bigger size, say full-on squad v2 86k train samples, re-running the inference to get predictions is time-consuming when we want to switch the Evaluator object.

Why

To speed up evaluation without re-running forward pass on a huge dataset. This can also help in debugging for such large samples because for large samples it's a bummer to catch the runtime errors (say tokenization error relating to weird texts, etc) at a late stage during the pipeline.

How

Maybe, we can have a new CachedSimpleEvaluationPipeline or something like that to be able to load predictions from external files (text, JSON, etc.)


cc: @muthukumaranR

muthukumaranR commented 1 year ago

As far as data consistency, would it be possible to enforce checks (for tokenizing) on the DTOs during, say pipeline.build()? that way you're guaranteed to not have any in pipeline.run(). essentially split the execution of pipeline into two, where in first part you handle all the checks. and in the final part you run the pipeline.

As far as caching, I think the mechanism will be useful nevertheless.

NISH1001 commented 1 year ago

I like the build(...) mechanism. Will add to my to-do list. Right now, what we're basically doing is passing texts as it is to the transformers.pipeline(...) which implicitly handles all the tokenization, forward-pass, etc.