emanjavacas / pie

A fully-fledge PyTorch package for Morphological Analysis, tailored to morphologically rich and historical languages.
MIT License
22 stars 10 forks source link

Accelerate epoch evaluation #29

Closed PonteIneptique closed 4 years ago

PonteIneptique commented 5 years ago

Running Pie on the LASLA corpora, my dev or test are basically few hundred thousands tokens of size. The computation of score is extremely long.

My feeling is that we could probably do something like multi-thread or proc this part : https://github.com/emanjavacas/pie/blob/master/pie/models/scorer.py#L66

emanjavacas commented 5 years ago

mmmh, I can't see right now why that piece of code would slow down evaluation. Have you profiled it a little to see where it spends most computation?

PonteIneptique commented 5 years ago

I have played a little more with your code since then, I'll try to get back to you on this once I finish what I have started.

PonteIneptique commented 5 years ago

I have done some work in this direction. I still think one thing could be improved, ie the fact that we loop over all task one by one. Wondering if a "metascorer" function could be useful, eg iterate over all scorer of each task in a zip() and compute using it. Memory could be an issue but I highly doubt it since trues and preds are already stored.

PonteIneptique commented 5 years ago

if you agree with last comment, I'll do it and replace instances of

        for task in model.evaluate(set1, set2).values():
            task.print_summary()

with

metascore_print_summary(evaluation_dict)

or something like it

PonteIneptique commented 5 years ago

If I am correct, the complexity after the pr #32 is

n{tasks} * n{words} and then n{tasks} * n{type of scores specific to each task}.

I am wondering if a zip could remove n{tasks}

emanjavacas commented 5 years ago

where exactly would you put this zip?

On Thu, Aug 29, 2019 at 11:44 AM Thibault Clérice notifications@github.com wrote:

If I am correct, the complexity after the pr #32 https://github.com/emanjavacas/pie/pull/32 is

n{tasks} n{words} n{type type of scores}.

I am wondering if a zip could remove n{tasks}

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/emanjavacas/pie/issues/29?email_source=notifications&email_token=ABPIPI5MTKJVIDVQKZ4YVGDQG6K6BA5CNFSM4IMR77JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5N46ZI#issuecomment-526110565, or mute the thread https://github.com/notifications/unsubscribe-auth/ABPIPI36ZEBRTOL4LN44AWDQG6K6BANCNFSM4IMR77JA .

-- Enrique Manjavacas

PonteIneptique commented 5 years ago

I don't know for sure now, I would not reuse individual .get_score() at all, that's for sure.

It's just an idea which I'll investigate after #32 . This already removed quite some computation I believe.

PonteIneptique commented 5 years ago

Let's forget the zip, it would be a very hard (impossible to do) without hardcoded values.

Pretty happy about the situation, even more if you agree with multiprocessing ;)

emanjavacas commented 5 years ago

Ye, multiprocessing should definitely help.

PonteIneptique commented 4 years ago

I think the last PRs did well, we should probably reopen if needed :)