-
I trained a model using multiple A800 GPU cards with TP + PP + ZeRO Stage 1. But I found that `ncclInternalError` or `ncclRemoteError` would happen occasionally (cannot reproduce always). Here are som…
-
the Sigfuz testcase[1] is causing the following TM Bad Thing:
```[ 5901.444301] Unexpected TM Bad Thing exception at c00000000000e9ec (msr 0x8000000302a03031)
cpu 0x3: Vector: 700 (Program Check) …
-
Some of the functions in slb.c are not safe for tracing. Move these out into `slb_low.c`, and mark that whole file as not traceable.
-
Thanks for the excellent work. Following the comment in #59, I am trying to train `dmoe_760m` using 16 GPUs (2 nodes) by changing distributed arguments to set up for two nodes but it is very slow in t…
-
Can we have a package to maintain the WebSocket subscriptions , and fetching data made simpler with web sockets for multiple securities and similar for rest api's ,
I've been trying to do the sam…
-
> [!NOTE]
> This issue could _not_ be related to Rails but I've been unable to track it down. Curious if anyone else has hit it. If it's deemed to not be Rails related, we can close this issue.
…
-
This project seems to be the best and most complete on the Github. Thank you!
The new TWS API supports P&L data requests but it looks like this was never implemented. I was trying to do it myself bu…
-
occasional kernel message seen running tests in CI.
azure vm with ubuntu host kvm hypervisor. Problem is seen with ubuntu 18.04, 20.04 and 22.04.
running the same tests on windows 10 hyper-v host …
sv641 updated
2 years ago
-
sambamba version: 0.7.1, downloaded binary from release's page and uncompress for using.
```shell
LSB Version: core-2.0-amd64:core-2.0-noarch:core-3.0-amd64:core-3.0-noarch:core-3.1-amd64:core-…
-
When creating a vpool a upstart file is generated, in this upstart file a volumedriver parameter seems to be missing: `-o async_dio`
After the change:
```
exec /usr/bin/volumedriver_fs.sh -f --confi…