-
_The problem:_ Not all residual spectrum outliers originate solely from line strength mismatches. In general, line _width_ mismatches or line center shifts will be present in the residual spectrum as…
gully updated
5 years ago
-
As mentioned in https://github.com/WoosukKwon/cacheflow/pull/81#issuecomment-1546980281, the current PyTorch-based top-k and top-p implementation is memory-inefficient. This can be improved by introdu…
-
GPUs are hungry pieces of hardware and want a steady supply of commands. Many practical algorithms involve many interations where each iteration launches one or more kernels that are by themselves no…
-
I have rebuilt `ginkgo` from the latest commit in master, same results as from the last release:
```
---> Testing ginkgo
Executing: cd "/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_math…
-
## Integrating DeepSpeed with PyTorch Lightning
Integrating DeepSpeed with PyTorch Lightning can significantly enhance training efficiency and scalability, especially for large models and distribut…
-
**Describe the bug**
In https://github.com/pixie-io/pixie/pull/1795, we introduced the ability for kernels that support the 1M BPF program limit to raise certain tunables used to restrict program siz…
-
For distributing large computations but also application such as position dependent PSF kernels it would really useful to have a split and merge scheme for `Map` objects. There are many thing to be c…
-
From Etienne @EtienneBachmann :
Another idea to improve adjoint run speed is to merge the GPU kernels in the compute_kernels routine, where rho kernels and other kernels are separated. It should n…
-
### Feature description
I find myself limited by nebari with respect to working on larger analyses (multiple notebooks spread across directory tree). Locally, I would either:
- start JupyterLab with…
-
https://www.microsoft.com/en-us/research/blog/deepspeed-accelerating-large-scale-model-inference-and-training-via-system-optimizations-and-compression/
>High-performance INT8 inference kernels are …