-
Hello,
I'm facing an issue when calling `.compute` in distributed multi-node setting.
The symptoms are the same as in huggingface/datasets#4420 , however I'm not sure the cause is the same (the co…
-
As mentioned in https://github.com/dask/distributed/issues/7639#issuecomment-1489013077 , we are seeing what we think is a bug due to the removal of the [finalizer for a ThreadPoolExecutor](https://g…
-
## Issue description
Use torchrun (inside a virtual environment) to launch a Python script. The script can not import modules installed in that virtual environment. Changing to use torch.distribute…
-
Model: Qwen-14B-Chat (QWen2)
Dataset: https://huggingface.co/datasets/Hello-SimpleAI/HC3-Chinese/blob/main/open_qa.jsonl
Environment: 2 A30 GPU
Issue 1:
Error: can't init model correctly. Disab…
-
### Reproducing the behavior
When running "bend run-c" using "large" arrays in the quicksort example in the repo with theese changes on the main function:
```py
(GetDataset) = [1,2,3...2^19]
#…
-
# Implement Multi-GPU Support in Anomalib
Depends on:
- [x] https://github.com/openvinotoolkit/anomalib/issues/2257
- [ ] https://github.com/openvinotoolkit/anomalib/issues/2365
- [ ] https://git…
-
This new release brings many improvements, including improved training, SSE4.1 and AVX2 optimizations, and run-time CPU detection (RTCD).
The distributed models are now trained using only publicly …
-
For some reasons, the process fail when I run ./tools/uniad_dist_eval.sh ./projects/configs/stage2_e2e/base_e2e.py ./ckpts/uniad_base_e2e.pth 1
It fail when it is trying to evaluate on class bus, g…
-
Hi, I downloaded the Charades dataset and tried to train the dataset with the command:
`python tools/run_net.py --cfg configs/Charades/SLOWFAST_16x8_R50_multigrid.yaml DATA.PATH_TO_DATA_DIR ../Ch…
-
Problem
=======
As datasets become larger and larger, storing training samples as individual files becomes impractical and inefficient. This can be addressed using sequential storage formats and s…