-
UPDATE: See this comment below:
https://github.com/RobotLocomotion/drake/issues/12775#issuecomment-591637572
Don't have a clear picture on the full issue, but there's some odd things going on.
…
-
I encountered a problem when using int8 gemm cutlass kernel: https://github.com/NVIDIA/TensorRT-LLM/issues/2351
For shape [16,6144,4096], I got perf of `14us` in my unittest benchmark, but in real mo…
-
Hi. Thanks for awesome work.
I am not able to extract visual features in an efficient way. Its taking too much time even on GPUs. I am extracting visual features on 278 videos of 4-5 minutes duratio…
1980x updated
9 months ago
-
Thanks for your work!
I am confused about the eq. 3 in your paper about the computation of theta. Assuming we have the weights from the base model (w0), and 3 finetuned models (w1,w2,w3), and then…
-
### Issue type
Bug
### Have you reproduced the bug with TensorFlow Nightly?
Yes
### Source
binary
### TensorFlow version
v2.17.0
### Custom code
Yes
### OS platform and distribution
Linux M…
-
Related to #10383 -- but that proposal is about doing more efficient computation within one timestep (e.g. we want to run ICP refinement in parallel from many different initial guesses).
Another co…
-
## Classification: Feature Request
## Summary
It would be nice to have an option to pass a "progress bar tick" function as an argument to the `profile()` function (and other computationally expe…
-
After testing the functionality from https://github.com/OpenwaterHealth/OpenLIFU-python/pull/140 and comparing with the expected solution analysis output from Matlab (only `mainlobe_pnp_MPa` ATM) , th…
-
**TL;DR**: The quotient $(f_i(X) - f_i(z))(X - z)$ can be moved **entirely** over the base field when the polynomial $f_i$ is defined over the base field, with a small overhead in the proof size and v…
-
**What is your question?**
This is the computation requirement using a native way to execute a GEMM with custom input:
```py
x = torch.randn([4096, 4096])
y = torch.randn([4094, 4094])
y_pad = torch…