-
错误信息:
Process Process-6:
Traceback (most recent call last):
File "/home/yanjiaxing/anaconda3/envs/paddle_env/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
…
-
I use the dmoe in deepspeed or fsdp. i find in the begining, the memory cost is about 33g. As the number of training increases, the occupied video memory increases a little bit and finally exceeds 80g…
-
Currently we need a unique id to create in kernel static shared memory:
```C++
Type& var = declareSharedVar(acc);
```
With c++20 we could auto generate this id:
```C++
#include
#includ…
-
### Description
In Gluten there is one common issue is "killed by yarn", the root cause is that some memory allocation usually the std::vector which bypasses memory pool track. Some of the std::vecto…
-
For the following triton kernels generated by pytorch, triton generated shared memory stores and loads in the LLVM IR and PTX just before the atomic add operation.
```python
from ctypes import…
-
Prompted by https://community.k6.io/t/ways-to-transfer-files-in-tests/1552
After https://github.com/k6io/k6/pull/1841, can we add an optional argument (e.g. `r`) to [`open()`](https://k6.io/docs/ja…
-
### What happened + What you expected to happen
[Microbenchmark](https://github.com/ray-project/ray/blob/master/python/ray/_private/ray_experimental_perf.py#L150) results for a single-actor acceler…
-
We have identified that deploying multiple modules for data transmission on each server within the cluster leads to second-level latency tails in RDMA cluster data transfers, as detailed in [issue 997…
-
The MPI-3 standard introduced functions which enable multiple ranks on the same node / NUMA-domain to use shared local memory to communicate.
+ Can be used to for example share input data.
+ Faste…
-
- [x] Set up Git and understand the existing approach
- [x] Brainstrom on the architecture for shared memory
- [x] Design a dummy API for this module