Error when training DPR model on my own dataset (Tutorial 9)

ranieristyaa commented 3 months ago

Describe the issue i am about to fine tune a DPR model on my own dataset. i could run the training process before with no error, last time i run it was like 1 week ago. but now when i am trying to run the training again with same data, same code, and same environment it keeps getting error like this:

To Reproduce here is my colab code: https://colab.research.google.com/drive/1bKR4cNkxQwJhmm_gXfhdgHNmKIsgvu-R?usp=sharing and the data i am using: answersDPR.json

Expected behavior the code supposed to run correctly like this:

and the model should fine-tuned succesfully.

What environment did you try to run the tutorial on?:

OS: Windows 11
Browser : chrome
Haystack Version 1.x

anakin87 commented 3 months ago

Probably something has changed in the latest versions of PyTorch.

I managed to fix the error with the following commands:

import torch.distributed as dist
import os

os.environ['MASTER_ADDR'] = '127.0.0.1'
os.environ['MASTER_PORT'] = '29500'

dist.init_process_group("gloo", rank=0, world_size=1)

More information in the PyTorch docs: here and here.

ranieristyaa commented 3 months ago

it is fixed, thank you

deepset-ai / haystack-tutorials

Error when training DPR model on my own dataset (Tutorial 9) #309