-
Adding `tf.keras.layers.Dropout` to model results the **following error**:
```
tensorflow.python.framework.errors_impl.AbortedError: Compute: Operation received an exception: Compute: No MLCTraining…
-
When trying out the keras autoquant notebook, the error as shown in title appears. It seems to be an issue related to the quantization op library
Full Error message:
AttributeError: in user code:
…
-
**Describe the bug**
I implement multiple transformer layers with only one-layer parameter (e.g., recursively use one layer six times to construct a 6-layer transformer), when I use activation checkp…
-
I was attempting to view the feature maps of a pretrained VGG model in PyTorch.
Instead of saving the features in the `forward` method of the model, I registered a forward hook with the layer(s) wher…
-
This is an idea for a module to calculate the activation time in a few different ways including min dv/dt, max gradient, and Matthijs' method. Some thought needs to be put into the methods that are t…
-
Hi @ismailuddin
I looked at your implementation of Grad-CAM and it seems to me that the heapmaps are calculated using gradients of post-sofmax outputs rather than logits (pre-softmax). The last lay…
-
Hi Matthias,
I am using GCNConv to solve a prediction task problem with linear layers at the output of the GNN. The model is trained on graphs of 10K nodes with ~20K-40K edges. The gradient value d…
-
**Describe the bug**
I launch deepspeed training for a 600M parameter diffusion model, and only vary `reduce_bucket_size`.
I tried the following values:
- `reduce_bucket_size: 500_000_000` — conve…
-
Dear @1Konny,
Thanks for your implementation!
I have detected that line 168 in `gradcam.py`:
`alpha_denom = gradients.pow(2).mul(2) + \
activations.mul(gradients.pow(3)).view(b…
-
Thank you for this wonderful example, which helped me understanding the gradient descent implementation.
I just noticed a minor mistake:
- dW_curr = np.dot(dZ_curr, A_prev.T) / m
- db_curr = np…