-
### Issue type
Bug
### Have you reproduced the bug with TensorFlow Nightly?
No
### Source
source
### TensorFlow version
tf 2.17
### Custom code
Yes
### OS platform and di…
x0w3n updated
1 month ago
-
### Description
The first time I encountered this error was run mult-node. Then after I run another code, single node also encountered this problem which was ok before. I think this error has s…
-
Click to expand!
### Issue Type
Performance
### Source
source
### Tensorflow Version
2.6
### Custom Code
No
### OS Platform and Distribution
_No response_
### Mobil…
-
I met this problem while building your demo through readme.
> `(merf) gpuadmin@sg6:~/YBS/merf$ ./train.sh
> 2024-07-10 10:11:11.173861: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc…
-
### Description
I am calling `jax.nn.dot_product_attention` with the following line:
```
dpsa_cudnn = jax.nn.dot_product_attention(query, key, value, implementation='cudnn')
```
However, this t…
-
I'm trying to build xla from source for CPU following the instructions [here](https://github.com/openxla/xla/blob/main/docs/developer_guide.md) and it's failing with:
```
xla/service/gpu/runtime/…
-
I ran a few test jobs based on the recent [llama2-7B fine-tuning blog](https://www.philschmid.de/fine-tune-llama-7b-trainium#3-fine-tune-llama-on-aws-trainium-using-the-neurontrainer) using the latest…
-
### 🚀 The feature, motivation and pitch
# Current situation
Currently to merge a XLA breaking pr, we follow these procedures
1. PyTorch/XLA merge the fix PR, which will break the PyTorch/XLA head C…
-
## 🚀 Description
Pipeline parallelism is a technique used in deep learning model training to improve efficiency and reduce the training time of large neural networks. Here we propose a pipeline paral…
-
### Description of the bug:
When I use `bazel build //... -c opt --jobs 8 --sandbox_debug` to compile a repository, it just says "bazel FAILED: Build did NOT complete successfully". But there are no…