-
## User Interface
- [ ] Sail CLI (#245)
- [ ] Sail configuration
## Core Functionalities
- [x] Distributed processing setup (#244)
- [x] Distributed job stages and shuffle (#265)
- [ ] Worker …
-
因为根据代码
if distributed:
sampler = DistributedSampler(dataset) # 似乎没有设置乱序,shuffle应该是默认为false
else:
sampler = RandomSampler(dataset)
同时谢谢您的杰出工作!
-
Hello,
It seems that the dataloader is not adapted to distributed setting (Line 881 at train.py).
The data entries will be repeatedly loaded and trained by different processes.
Maybe a sampler sho…
-
**Describe the issue**:
The `p2p` shuffle algorithm fails when working with frames that contain columns of mixed numeric/string dtypes; this error occurs during computation when Arrow attempts to…
-
**报错信息如下**
```
Traceback (most recent call last):
File "/data1/bert4rec/bert4rec-main/scripts/bole/loaddata_run_product.py", line 5, in
config, model, dataset, train_data, valid_data, test_…
-
Subsequent Blockwise layers are currently fused into a single layer. This reduces the number of tasks, the overhead and is very generally a good thing to do. Currently, the fused output does not gener…
-
Occurred on https://github.com/dask/distributed/pull/6400 but I've seen it in other PRs as well
https://github.com/dask/distributed/runs/6524459060?check_suite_focus=true
```
______________________…
-
### Context:
We are following the [FSDP example](https://github.com/aws-samples/awsome-distributed-training/tree/main/3.test_cases/10.FSDP) and trying to understand the mechanism behind how differe…
-
### What is the problem the feature request solves?
I noticed that we execute each query stage with two separate native plans.
For example, here is the first query stage for TPC-H q1:
```
+-…
-
### Describe the bug
The sharding of IterableDatasets with respect to distributed and dataloader worker processes appears problematic with significant performance traps and inconsistencies wrt to d…