Optimize batch size. - Githubissues

AinaIanemahy commented 7 months ago

We should decide whether we want to optimize the batch size parameter. The default value given by pierluigi is 32.

AinaIanemahy commented 7 months ago

Also see previous issue #36

Garrafao commented 7 months ago

Could we run the integration tests with different batch sizes to see whether it has an impact on the performance?

shafqatvirk commented 7 months ago

@Garrafao Is running such integration test is enough for testwug dataset, or we should run on all?

Garrafao commented 7 months ago

Mhhh… If we want a realistic test, I would say on one of the larger data sets too, maybe DWUG DE?

shafqatvirk commented 7 months ago

Here are results with various batch-sizes:

Annotator Data accuracy correlation p-value batch_size XL-Lexeme-Binary dwug_de 0.778 0.516 0.0 8

Annotator Data accuracy correlation p-value batch_size XL-Lexeme-Binary dwug_de 0.778 0.516 0.0 16

Annotator Data accuracy correlation p-value batch_size XL-Lexeme-Binary dwug_de 0.778 0.516 0.0 32

Annotator Data accuracy correlation p-value batch_size XL-Lexeme-Binary dwug_de 0.778 0.516 0.0 64

Annotator Data accuracy correlation p-value batch_size XL-Lexeme-Binary dwug_de 0.778 0.516 0.0 128

Annotator Data accuracy correlation p-value batch_size XL-Lexeme-Binary dwug_de 0.778 0.516 0.0 256

Annotator Data accuracy correlation p-value batch_size XL-Lexeme-Binary dwug_de 0.778 0.516 0.0 512

No variance in performance. Hence this issue is being closed.

Garrafao / durel_system_annotators