evoformer Search Results

239 results
for evoformer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

microsoft/DeepSpeed-MII #443

[BUG] Issue serving Mixtral 8x7B on H100

Running into issues when serving Mixtral 8x7B on 4 x H100 (TP=4) with deepspeed-mii v0.2.3 with all other arguments default in the base image from nvidia `nvidia/cuda:12.3.1-devel-ubuntu22.04` The …

ghost updated 2 months ago
9
microsoft/DeepSpeed #4888

[BUG] When using Zero-Infinity, Assertion `n_completes >= mi…

**Describe the bug** I can use my script to finetune model with zero 2 and 3. However, when I use zero infinity offloading parameters. the error occurs: python: /opt/conda/lib/python3.10/site-pack…

IvoryTower800 updated 1 week ago
6
YoshitakaMo/localcolabfold #147

Question: Could not run on a cluster

**What is your question?** I have successfully run localcolabfold on our workstation with A4000 GPU. I tried to install it on a cluster with A100 80GB GPU to have more GPU memory. The installation wa…

hypothalamus01 updated 11 months ago
3
microsoft/DeepSpeed #5898

[BUG] Gradient accumulation causing training loss difference…

**Describe the bug** I am trying to pretrain an [Olmo ](https://github.com/allenai/OLMo)1B model on 8 MI 250 GPUs with Docker image: rocm/pytorch:latest (ROCm 6.1). I'm using a small subset of Dolma …

gramesh-amd updated 3 days ago
2
microsoft/DeepSpeed #6465

Windows wheel build error

I am getting errors while building deepspeed wheel, i set a whole bunch of options to 0 in cmd before since they were also throwing errors it seems, listing them: DS_BUILD_GDS, DS_BUILD_FP_QUANTIZER, …

divi0001 updated 1 week ago
12
microsoft/DeepSpeed #5822

[BUG] Universal checkpoint conversion failed

**Describe the bug** while converting a sharded zero3 checkpoint of llava styled multimodal model, I got the following error """ Traceback (most recent call last): File "/scratch/hongshal/co…

hsl89 updated 1 day ago
8
microsoft/DeepSpeed #5201

[BUG] reduce scatter cannot be overlap when using zero

**Describe the bug** reduce scatter cannot be overlap when using zero **To Reproduce** DeepSpeed Configs: ``` json = { "train_batch_size": 64, "train_micro_batch_size_per_gpu": 1, …

QiaolingChen00 updated 6 months ago
2
microsoft/DeepSpeed-MII #506

non-persistent simple example does not work

Starting from the code pipe = mii.pipeline("mistralai/Mistral-7B-v0.1") It does not work (on A100 python 3.10 and cuda12.1 ImportError: torch_extensions/py310_cu121/ragged_device_ops/ragged_…

mohbay updated 1 month ago
5
sokrypton/ColabFold #378

Folding jobs fail repeatedly (one example of many with the s…

## Expected Behavior Batch is able to run through all the queries in a CSV file ## Current Behavior Stops running at certain sequences that cause an internal issue. input which caused failur…

ernst-schmid updated 1 year ago
5
microsoft/DeepSpeed #4969

[BUG]RuntimeError: the new group's world size should be less…

**Describe the bug** i have 4 gpus，but i set mp_size=3, it goes wrong **To Reproduce** Steps to reproduce the behavior: ``` model_name = "/data/share/rwq/Qwen-7B-Chat" payload = "你好" tokeni…

ArlanCooper updated 3 months ago
1

上一页 1...1 2 3 4 5 6 7...24 下一页

239 results for evoformer

239 results
for evoformer