-
Thank you for your outstanding work. I have attempted to apply your methods as described in the paper to RT-4DGS, specifically in the following segment:
`
render_pkg = render(viewpoint_cam, gaussian…
xi-zc updated
1 month ago
-
I am having troubles with --methy-extract command.
I succeeded in generating *bmm files but I am having problems with understanding how to pass them properly as the option is not documented. Judging …
-
### 🚀 The feature, motivation and pitch
3D fp8 matrix multiplication can be useful for fp8 model with 3D matmul (it also can be used to improve accuracy of models with 2D fp8 quantized matrix multi…
-
I have a Seq2Seq network with attention and when training with Apex/O1 optimization I notice that mixed precision is more than 3x slower. It seems that BMM is the culprit. Any ideas why this is happen…
-
> [...] but this is something we can certainly tune in `torch.compile` with max-autotune. cc @ptrblck @csarofeen @xwang233 @ezyang @msaroufim @bdhirsh @anijain2305 @zou3519 @voznesenskym @penguinwu @…
-
Meanwhile, the request txs only show -0.01 btc each.
Also, the 0.01 btc already get subtracted from the immature balance.
-
In models.networks.py, energy = torch.bmm(proj_query.permute, proj_key)
RuntimeError: CUDA out of memory. Tried to allocate 268.21 GiB (GPU 4; 10.92 GiB total capacity; 1.80 GiB already allocated; 8.…
-
Hi!
I am trying to optimize a custom model. Before, I was facing not implemented error. Trying the solution provided [here](https://github.com/NVIDIA-AI-IOT/torch2trt/issues/367). However, now I am f…
-
### 🐛 Describe the bug
**Summary**
When attempting to compile a math sdp operation using torch.compile in AMP mode, the script encounters a crash. This issue does not occur with a single Flash Atten…
-
### 🚀 The feature, motivation and pitch
Hi,
I want to perform a sparse-dense BMM and compute gradients for the sparse matrix. Is there an operation in torch which does it efficiently? According to…