[ ] I have checked the CHANGELOG and the commit log to find out if the bug was already fixed in the main branch.
[ ] I have included in the "Description" section below a traceback from any exceptions related to this bug.
[ ] I have included in the "Related issues or possible duplicates" section beloew all related issues and possible duplicate issues (If there are none, check this box anyway).
[ ] I have included in the "Environment" section below the name of the operating system and Python version that I was using when I discovered this bug.
[ ] I have included in the "Environment" section below the output of pip freeze.
[ ] I have included in the "Steps to reproduce" section below a minimally reproducible example.
Description
When i perform multi-task learning with allennlp, i config the MultiTaskDataLoader as following:
I set 'instances_per_epoch' to 8000 and 'batch_size' to 16. I expect that there are about 500 steps in an epoch. However, when i run my codes, the process bar shows that there are 3000 steps. But actully, there isn't that much. After 502 steps, the epoch completed.
After checking, i find that the following codes in MultiTaskDataLoader is wrong:
From the __init__ function in MultiTaskDataLoader, we can know that when 'instances_per_epoch' is set, the sampler will also be provided.
So, when we count instances for each dataset, we should take into consideration the proportion of each dataset provided by the sampler. Thus, the aforementioned wrong codes should be replaced by the following codes:
Checklist
main
branch of AllenNLP.pip freeze
.Description
When i perform multi-task learning with allennlp, i config the MultiTaskDataLoader as following: I set 'instances_per_epoch' to 8000 and 'batch_size' to 16. I expect that there are about 500 steps in an epoch. However, when i run my codes, the process bar shows that there are 3000 steps. But actully, there isn't that much. After 502 steps, the epoch completed.
After checking, i find that the following codes in MultiTaskDataLoader is wrong:
From the __init__ function in MultiTaskDataLoader, we can know that when 'instances_per_epoch' is set, the sampler will also be provided.
So, when we count instances for each dataset, we should take into consideration the proportion of each dataset provided by the sampler. Thus, the aforementioned wrong codes should be replaced by the following codes:
Here is the codes: `
`
Python traceback:
``` ```
Related issues or possible duplicates
Environment
OS: Linux
Python version: 3.7.13 Allennlp version: 2.10.1
Output of
pip freeze
:``` ```
Steps to reproduce
Example source:
``` ```