-
https://www.upsolver.com/blog/batch-stream-a-cheat-sheet
_Originally posted by @christianfelicite in https://github.com/TheFeloDevTeam/FeloFamilySite/issues/57#issuecomment-655962286_
What is stream…
-
### What happened + What you expected to happen
Reopening this issue: https://github.com/ray-project/ray/issues/41973. I have included additional comparative scripts where quite literally the only …
-
### 🚀 Feature
NeMo's NeVa (LLaVa) is a multimodal language model
Initial `examine`:
`Found 49 distinct operations, of which 39 (79.6%) are supported`
### Work items
- #145 (but looks like #…
-
For custom grad_samplers or functorch-based implementation
-
While finetuning RWKV, I use this script(using demo dataset by `make_data.py` and put `demo.bin` and `demo.idx` in `./data`):
```
#!/bin/bash
BASE_NAME="model/demo"
N_LAYER="12"
N_EMBD="768"
M…
-
I use 2node, 4gpus per node. The same training batch size work for single node while when applying to multi-node, this error is printed out:
![0529dcfca48bf574eb1276045c17ed34](https://user-images.…
-
initializing deepspeed distributed: GLOBAL_RANK: 0, MEMBER: 1/1
Enabling DeepSpeed BF16.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
Traceback (most recent call last):
File "./train.py", lin…
-
Hi @kevinwallimann
First I'm aware of the [issue](https://github.com/AbsaOSS/ABRiS/issues/176)
I used abris to stream avro to delta table. As mentioned in the issue the schema evolution is not wor…
-
Given a model size and number of gpus, how can we calculate what kind of throughput should the interconnect network have to handle ZeRO-3 traffic. Is 100Gbps enough? or does one need 1_000Gbps?
Use…
-
### Is this your first time submitting a feature request?
- [X] I have searched the existing issues, and I could not find an existing issue for this feature
### Describe the feature
When one has a …