QueuQ / CGLB

Other
50 stars 15 forks source link

Reddit Dataset Batch Size Issue #10

Closed altayunal closed 1 year ago

altayunal commented 1 year ago

Hi @QueuQ,

When reproducing the results with Reddit dataset, there seems to be an issue considering the last task. For the last task, I am receiving the following error concerning the batch size.

ValueError: Expected input batch_size (1) to match target batch_size (0).

Have you come across with this problem? What would be the cause of this size misfitting problem? Thanks in advance.

QueuQ commented 1 year ago

Thanks for your interest!

Actually we did not meet this problem during experiments. This problem seems to be caused by that the last batch only contains one example, and some size related problem occurs. If this is the case, then it is highly dependent on the batch size you choose. You may choose another batch size to avoid the last batch from containing only one example. Another possible solution is to choose drop_last=True when building the data loader. For example, in the code shown below, change the drop_last would delete the last single example to avoid the error. Currently, we set it as False since we guess changing batch size seems a better solution https://github.com/QueuQ/CGLB/blob/3e0debf02e582610d05274b44c4c09fc1c1fe4b2/NCGL/pipeline.py#L1126-L1128

altayunal commented 1 year ago

Thank you for your feedback!