-
## Description
Cleanup methods that are slower do not run to completion when restarting kernel.
## Reproduce
1. Create a new IPython notebook
2. Create and execute a new cell wit…
-
either torch.compile / triton, forward / backward operations got too much activations that are probably bottlenecking training.
For some reason, i got about 30% speedup at 1B scale but does not seem …
-
### 🚀 The feature, motivation and pitch
DTW is a crucial algorithm for measuring similarity between temporal sequences, but its computational complexity can be a bottleneck, particularly with large…
-
使用ch_PP-OCRv4_rec_svtr_large.yml训练的OCR识别模型,训练正常,
使用python tools/eval.py -c configs/rec/PP-OCRv4/ch_PP-OCRv4_rec_svtr_large.yml也是正常的,
用python tools/export_model.py -c configs/rec/PP-OCRv4/ch_PP-OCRv4…
-
Actually the SYCL specification does not allow to remove dead/unreachable kernels at least because of free functions such as `get_kernel_bundle` (which could be called indirectly from another translat…
-
**Describe the bug**
This was discovered when porting shoc_energy_integrals to small kernels. I was getting large differences in the outputs of the view_reductions when num_threads>1. I suspect the p…
-
### 🐛 Describe the bug
After #134373 I started getting the error "RuntimeError: CUDA error: operation not supported" when trying to run pytorch.
Fresh build from source succeeds before #134373 and f…
-
It would be great to allow the user to supply all needed temporary buffers for 2D and 3D transforms. Currently internal 1D transforms allocate their only temporary buffers. This is a problem when baki…
-
## 🐛 Bug
This is a lengthy issue/post detailing my observations with our distributed and bucketing performance. Some of these are actionable items and some are just observations to be aware of.
…
-
Pandas may sometimes choose to convert large integers (close to 64-bit integer limit) to floats. This is very common on register dataset on 64-bit architecture and is fixed with #41. Similar situation…