Closed PonteIneptique closed 4 years ago
mmmh, I can't see right now why that piece of code would slow down evaluation. Have you profiled it a little to see where it spends most computation?
I have played a little more with your code since then, I'll try to get back to you on this once I finish what I have started.
I have done some work in this direction. I still think one thing could be improved, ie the fact that we loop over all task one by one. Wondering if a "metascorer" function could be useful, eg iterate over all scorer of each task in a zip()
and compute using it. Memory could be an issue but I highly doubt it since trues and preds are already stored.
if you agree with last comment, I'll do it and replace instances of
for task in model.evaluate(set1, set2).values():
task.print_summary()
with
metascore_print_summary(evaluation_dict)
or something like it
If I am correct, the complexity after the pr #32 is
n{tasks} * n{words}
and then n{tasks} * n{type of scores specific to each task}
.
I am wondering if a zip could remove n{tasks}
where exactly would you put this zip?
On Thu, Aug 29, 2019 at 11:44 AM Thibault Clérice notifications@github.com wrote:
If I am correct, the complexity after the pr #32 https://github.com/emanjavacas/pie/pull/32 is
n{tasks} n{words} n{type type of scores}.
I am wondering if a zip could remove n{tasks}
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/emanjavacas/pie/issues/29?email_source=notifications&email_token=ABPIPI5MTKJVIDVQKZ4YVGDQG6K6BA5CNFSM4IMR77JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5N46ZI#issuecomment-526110565, or mute the thread https://github.com/notifications/unsubscribe-auth/ABPIPI36ZEBRTOL4LN44AWDQG6K6BANCNFSM4IMR77JA .
-- Enrique Manjavacas
I don't know for sure now, I would not reuse individual .get_score() at all, that's for sure.
It's just an idea which I'll investigate after #32 . This already removed quite some computation I believe.
Let's forget the zip, it would be a very hard (impossible to do) without hardcoded values.
Pretty happy about the situation, even more if you agree with multiprocessing ;)
Ye, multiprocessing should definitely help.
I think the last PRs did well, we should probably reopen if needed :)
Running Pie on the LASLA corpora, my dev or test are basically few hundred thousands tokens of size. The computation of score is extremely long.
My feeling is that we could probably do something like multi-thread or proc this part : https://github.com/emanjavacas/pie/blob/master/pie/models/scorer.py#L66