IBM / multidoc2dial

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents
Apache License 2.0
67 stars 22 forks source link

fix multiGPU #15

Open ROGERDJQ opened 2 years ago

ROGERDJQ commented 2 years ago

Here may include a issue when multiGPUs are used. Since the default self.batch_size=8 at L452, when multiGPUs are used and data shape is more than 8, I found the domain_batched at line452 has actually fewer dimensions than it should be, which leads to the zip error at line 460. (suppose for 3 GPU with train batch size=4, with all domains, line 452 only returns one element, while the other three at line 456 returns 2 elements: one with shape (8,), the other with shape (4,) ) It is noticed that domain_batched is not used afterward. So may be a straightforward way is to delete it.

songfeng commented 2 years ago

@sivasankalpp could you take a look?