-
This issue is to track the work on benchmarking the new Horovod backend for GPU clusters and getting profiling information for FlowPM.
We want to do the following things:
- [x] : Run profiler on…
EiffL updated
3 years ago
-
Hi,
We are experiencing unexpected `sccache` server shutdowns when building our C++ project ([OpenVINO](https://github.com/openvinotoolkit/openvino)) for RISC-V with [Conan](https://conan.io/) in G…
-
### Description
When building jaxlib with an externally installed copy of CUDA (something required by all package managers and HPC systems), I see the following error:
```
gcc: error: unrecognized …
-
Trying to compile this to run on a Pi Zero 2 W, running Raspbian aarch64.
I removed the following gcc flags:
`-mfloat-abi=hard`
`-mfpu=vfp`
and changed `-march=armv8-a`
But I still get errors..…
-
I run the make.sh successfully ,but when I run the train.py ,I meet this problem. can anyone give me some suggestions? appreciate so much !
Traceback (most recent call last):
File "train.py",…
-
Please see the error I am facing while running simple finetune command given in Readme:
`done importing
WARNING:jax._src.lib.xla_bridge:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_L…
-
## Description
We are trying to deploy diffusion model to tensorrt, and when the `refit` option is enabled, the deployment process will fail with this error:
```
[E] 10: Could not find any implem…
-
### What happened + What you expected to happen
Ray train official docs [script](https://docs.ray.io/en/latest/train/getting-started-transformers.html) fails due to NCCL error.
Proxy Call to rank 1 …
-
Hello
I'm experiencing occasional non-deterministic behaviour when running the script below on multi-device CPU (using flag `--xla_force_host_platform_device_count=8`).
The script runs 10 attemp…
-
### Context
OpenVINO component responsible for support of TensorFlow models is called as TensorFlow Frontend (TF FE). TF FE converts a model represented in [TensorFlow opset](https://www.tensorflow…