-
Hello, thnak you for your great work
I am trying to make the training code.
So i started by making alpha loss, composition loss and regression loss, and i used radam optimizer
**RAdam(group_weigh…
-
I would like to build a neural network with a tunable number of layers. While I can tune the number of neurons per layer, I’m encountering issues when it comes to dynamically changing the number of la…
-
### 🐛 Describe the bug
I am attempting to train a convolutional autoencoder model with pytorch. I am using:
torch==2.3.1 + cuda 12.1 and 4 GPUs.
I have attempted this training with both pyt…
-
**Describe the bug**
Loading the llama2 70b model using 4 bit(bitstandbytes) and then distributed the model by calling deepspeed.initialize. Get the following error
```
------------------------…
-
This training loop takes more than a second per epoch using tensorflow-directml but a fraction of a second with standard tensorflow.
It actually doesnt work at all (error is NaN after a couple of ite…
-
Dear developers,
Recently, I am trying to write code for calculating MLE via TFP.
I found that TFP will not track the `loc` parameter of multivariate normal when using `GradientTape`
Here is an e…
-
**Describe the problem**.
A model that contains a nested `WideDeepModel` submodel throws an error when saved. The problem goes back at least since TFv2.9. I've tested the latest nightly and the iss…
-
LeNet model used,
Traceback (most recent call last):
File "main.py", line 155, in
optimizer = optim.SGD(model.parameters(), lr=args.lr, momentum=args.momentum)
File "/usr/local/lib/python…
eeric updated
2 years ago
-
It seems that when I am training with multiple training objectives, the Transformer model is uploaded one time per objective in the GPU memory, despite being shared by all the losses. This is quickly …
-
tensorflow.python.framework.errors_impl.FailedPreconditionError: tfrecords; Is a directory
[[{{node pascalvoc_2007_data_provider/parallel_read/ReaderReadV2_1}}]]
This is my .sh:
DATASET_DIR=./tfr…