-
Hi,
I'm seeing various errors in muelu unit tests on nvidia h100 gpu using cuda 12.4 w/ kokkos uvm flag enabled. I'm not sure which approach is preferred for reporting here but I've tested using tw…
-
Hello,
Not sure if that's more a Stackoverflow issue or not but I'm posting it here for now.
After playing a bit with Dask I was very surprised by how slow the OneHotEncoder is compared to Scitk…
-
@ilan-gold @ivirshup if you have time, it'd be nice to see a groupby-reduce workflow you'd like to see supported natively by flox.
-
Enhancing the "What is SystemML" section on the main page to also include deep learning here. Vijay brought up the idea that we could also show the existing 3 min SystemML video here - the one that we…
-
the saving of table works, but I did not find any documentation.
`JuliaDB.save(db, "path/of/db")`
works without documentation.
```
help?> JuliaDB.save
save(io::IO, val)
Save a value in…
-
When I execute the command line, an error occurs. Can anyone solve it?
accelerate launch -m --config_file accelerate_config.yaml --machine_rank 0 --main_process_ip 0.0.0.0 --main_process_port…
-
Dask worker dies while during dask-xgboost classifier training ; It is being observed while running `test_core.py::test_classifier`
Configuration used -
```
Dask Version: 2.9.2
Distributed V…
-
Here is a meta-issue to track progress on the implementations of Intel's [Parallel Research Kernels](https://github.com/ParRes/Kernels) in Chapel.
## Resources
- [test/studies/prk](https://gith…
-
hi Vincent,
I have another problem, I use 0 1 as training GPU and numbatches_to_aggregate=0 in default config standardtrainer.cfg , but I found 3 Start master session in log. Is this behavior right?
…
fanlu updated
5 years ago
-
Hi, I have used 8 gpus to train dlrm recently. The command I use is `python3 -m torch.distributed.launch --nproc_per_node 4 python3 dlrm_s_pytorch.py --arch-sparse-feature-size=64 --arch-mlp-bot="13-5…