-
Hi @siller174
I initially thought of forking this but I thought it may benefits others, so I'm raising a PR for this one.
Basically, it's linked to https://github.com/ozontech/cute/issues/70
I a…
-
### Description
I'm trying to scale up some transformer training (currently at ~400m params), and as such I've been playing around with various ways to save memory and improve performance. On a whi…
-
Currently requesting dozens of prompts at the same time has the potential to clog the queue.
I want to make it so that each job in parallel, increases the kudos cost.
The calulcation I'm thinki…
-
Hi,
I am curious if it would be possible to implement Megatron-style sequence parallelism in the repository.
Sequence parallelism is crucial for reducing activation memory, which is otherwise ch…
-
### Summary
I created a container node registry.dp.tech/dptech/deepmd-kit:3.0.0b3-cuda12.1 using the Bourium platform. Then I installed the devel branch of DeepMD-kit with:
`conda create -n deepmd-d…
-
-
Ben, great work here, appreciate the investment of your time.
I've seeing what appears to be client-server serviceability issues when stress testing using this package as a way to process separate …
-
I am trying to set up a dynamic kernel wherein a KA kernel launches a CUDA kernel. The final objective would be to have dynamic parallelism using only kernel abstractions. This is a MWE showing the c…
-
I'm not sure if this question is appropriate to ask here, I'm not a professional programmer, if anyone is willing to offer help and guidance, I would be very grateful.
Two weeks ago, I started usin…
-
## Abstract
Currently we use gevent. It creates greenlets for the different jobs that we perform in the app. This is not a multithread or a multi process logic and performance gets hit. We need to …