-
Hi,
I am trying to reproduce the results of ZeRO on GPT2 and BERT pretraining.
I followed this tutorial:
https://github.com/microsoft/DeepSpeedExamples/blob/master/Megatron-LM-v1.1.5-ZeRO3/READM…
-
Hi,
Thanks for open-sourcing this work. When I was trying to train the teacher network of LiDAR and fusion I wasn't able to start it with multiple GPU. Single-GPU training works. Multi-GPU training…
-
### Checklist
- [ ] The issue exists after disabling all extensions
- [X] The issue exists on a clean installation of webui
- [X] The issue is caused by an extension, but I believe it is caused by a …
-
I'm trying to call `problem.impute()` on a solved (linear) spatial mapping problem of dimensions `n_source=17806` (spatial data) by `n_target=13298` (single-cell data) for `n_genes=2039`. This is just…
-
Hi!
I was trying to build an application on Streamline when I ran into this problem. At first it just seamed that because of some error on the frontend the changes I made to an Aggregate processor …
-
### Expected behavior
The following Relay program should be successfully compiled:
```
def @main(%x0: Tensor[(1, 2), float32] /* ty=Tensor[(1, 2), float32] */, %x1: Tensor[(1, 2), float32] /* t…
-
Hello,
I'm not sure at what stage STAR-fusion fails - but it seems to have generated the star-fusion.fusion_predictions.abridged.coding_effect.tsv but might be missing some of the output files fro…
-
**Describe the bug**
When running `merge_lora_weights/merge.py` with TP and PP set to 1 on a fine-tuned minitron checkpoint, I run into the following error:
```sh
raise RuntimeError(f"world_size ({w…
-
Unit tests should go under the [tests/](https://github.com/Storia-AI/sage/tree/main/tests) folder.
The interactions with Pinecone and Marqo will have to be mocked out.
-
I am try DeepSpeed. I am read docs and modify one project for it.
And I am get strange result:
1) Original code without any speed up. 1 docker container. 1 GPU. 10 epoch.
Time: 5 min 50 sec. On…