MaastrichtU-IDS / dsri-documentation

📖 Documentation for the Data Science Research Infrastructure at Maastricht University
https://dsri.maastrichtuniversity.nl
MIT License
22 stars 8 forks source link

Updating YAML configuration for Pytorch to enable multiprocessing dataloaders #10

Closed surajpaib closed 4 years ago

surajpaib commented 4 years ago

Usecase: Dataloading on multiple processes can be really crucial when we dynamically load data from the filesystem.

Current Scenario: Running a multiprocessed dataloader throws this error:

Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). 

Current Workaround: Refering to this, I've added the following sections in the current YAML configuration: https://docs.openshift.com/container-platform/3.7/dev_guide/shared_memory.html

Specifically I add the following under

          volumeMounts:
            - mountPath: /dev/shm
              name: dshm

and

      volumes:
        - emptyDir:
            medium: Memory
          name: dshm

Proposal: Maybe this could be added in the default config since multiprocessing is a very expected feature for pytorch deployments. Ofcourse, given that there are no downsides to doing this by default.

Binosha commented 4 years ago

Hi @surajpaib ,

volumeMounts:                 
        - mountPath: /dev/shm
          name: dshm

VolumeMounts describe about the volume of inside the container.

volumes:
        - emptyDir:
            medium: Memory
          name: dshm

Volumes describe about the outside of storage. So according to the https://docs.openshift.com/container-platform/3.7/dev_guide/shared_memory.html also its explain Volumes should describe under the SPEC level and VolumeMounts should describe under the SPEC-> CONTAINERS level.

spec:
  volumes:                          
    - name: dshm
      emptyDir:
        medium: Memory
  containers:
    - image: kubernetes/pause
      name: hello-container1
      ports:
        - containerPort: 8080
          hostPort: 6061
      volumeMounts:                 
        - mountPath: /dev/shm
          name: dshm
surajpaib commented 4 years ago

Hi @Binosha ! Yes, I've done exactly the same as you've mentioned for my workaround. Was suggesting adding this to the default deployment config if possible. Do you think it makes sense to do so?

Binosha commented 4 years ago

Hi @surajpaib Thank you for the nice suggestion. I have added that part to the template. So you can test it and let us know if there are more improvements to done.

surajpaib commented 4 years ago

Thanks @Binosha! Will test over the weekend and close the issue!

Binosha commented 4 years ago

Hi @surajpaib , We would like to hear your feedback so we can keep improving our user support and content. Please fill this quick survey and let us know your thoughts (your answers will be anonymous).