-
I cloned this project and imported to my own python code. I updated my code as below:
siamese_net.compile(loss="AM-softmax",optimizer=optimizer, metrics=['accuracy'])
However, I got below erro…
-
I want to launch a primitive in a function and return immediately. I am getting a segv that looks like a memory corruption. If I wait for completion before returning, it works. I think the problem is …
-
So I am into attention network , one of the toughest to understand and book so far explains great, however on scaled dot product example, show scaling by 100 the product of ks,q skews. I extended this…
-
Hi,
Could you please provide details on how you test the model? Specifically, I am interested in understanding how the model is tested when using AAM Softmax for training.
When testing, are labe…
-
Hi @RetroCirce , I want to thank you for your great work!!!!
I am playing around the model checkpoint you provided (`HTSAT_AudioSet_Saved_6.ckpt`).
I can successfully load the model checkpoint and m…
-
### 🚀 The feature, motivation and pitch
Hello,
I am working on editing knowledge in transformers which, in the case of knowledge deletion, requires the minimization of the likelihood of targeted…
-
Hi, when studying the load importance loss, I found the parameters passed to the function `load_importance_loss` are softmax normalized scores and logits with noise (see [moe_layer.py L281](https://gi…
-
Thank you for this amazing work!
I was wondering if the fp8 implementation of flash attention 3 will be able for public to use? My main concern will be accuracy (block quant may have alleviated thi…
-
Hello,
thank you for this great work and the provided resources.
I have a question regarding how to get a scene graph from the models predictions.
I am using the config file `configs/config_3D…
-
I am running llama3 model on an rtx4090 with fp8 quantization. In the [result](https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/include/tensorrt_llm/executor/executor.h#L323), `outputTokenIds` see…