I am trying to convert FSDv2 to ONNX (and next to TensorRT), but there is an error:
RuntimeError: ONNX export failed on an operator with unrecognized namespace torch_scatter::scatter_max. If you ar…
### Feature request
Using (https://pytorch.org/blog/flexattention/) Flex-attention (and [Paged attention](https://github.com/pytorch/pytorch/pull/121845/files)) to speedup transformers models and p…
Noting down some areas where significant speedups may be achieved:
- `vcat` in `ProductNode`s leads to a lot of copying
- data deduplication in leaves may lead to lower memory requirements and als…
In the last few days I've been playing around trying to see how fast I can get a 19M model training on a single 4090. My somewhat arbitrary goal is 1 hour, down from about 24 hours (just on `humanoid-…
About 80 small tasks
- [ ] issues with modifying old ring reports in Sulka
- [ ] import from Loydos speedups
- [ ] ring distribution
- [ ] validations
- [ ] bug fixes
https://www.pivotaltracker.com/n…
As @olaugh has confirmed, reordering the distribution by frequency gives a 4% speedup:
```
?,?,2,0,0
E,e,12,1,1
A,a,9,1,1
I,i,9,1,1
O,o,8,1,1
U,u,4,1,1
S,s,4,1,0
R,r,6,1,0
N,n,6,1,0
T,t,6…
There's a lot to gain from speeding up pip's startup time.
For one, pip takes around 600ms to just print the completion text, which is laggy. (as mentioned in #4755). Further, faster startup time m…
The Python code probably shouldn't be translated directly into a different language as is. It should first be optimized for efficiency and then ported if still necessary.
* For example, if the doc…
## Feature Request
### Summary
Peer gossip does an exponential moving average for choosing what channels to gossip on. Its currently set to decrementing the tracker by 20% every 2 seconds.
…
```
Python: 3.11.8
Pandas: 2.2.1
Numpy: 1.26.4
```
There seems to be marginal benefit in terms speed when converting kdb tables into pandas using the .pd() method when using short ints instead …