-
The parallel cluster stuff does not work for `pace-python` at the moment. However, the configuration insists on checking its instance first before allowing you to change the configuration. The problem…
-
你好,非常想知道单卡单机训练时该怎么修改代码,自己尝试直接单卡运行,在此处报错
Traceback (most recent call last):
File "train.py", line 429, in
train()
File "train.py", line 292, in train
out, out16, out32, detail8 = net(…
-
There are three things we learned so far:
* using `zmq` simplifies the communication between the primary process and the subprocess.
* with `__getattribute__()` we can overwrite the functions of a…
-
Sorry to be burying you in issues. I just want to get these reported so you know about them/so you can help if I'm just doing something wrong.
Our backup script runs paryncfp twice on the only camp…
-
Hello,
I tried to run a fast tuning of GEMM with float16:
```python
from bitblas.base.roller.policy import TensorCorePolicy, DefaultPolicy
from bitblas.base.arch import CUDA
from bitblas.base.uti…
-
Currently, pytest fixtures seem to always be executed in the main thread. This prevents initializing thread-local configuration to run a particular test in isolation of concurrently running tests with…
-
-
- [x] I have visited the [source website], and in particular
read the [known issues]
- [x] I have searched through the [issue tracker] for duplicates
- [x] I have mentioned version numbers, opera…
-
Not working after updating to the main branch of TE in Megatron-LM.
-
Hey!
Sometimes I have to debug my code in sequential mode. But duplicating code (parallel and sequential execution) looks ugly.
It seems to me that introducing `DummyWorkerPool ` is good solution fo…