model-flops Search Results

1000+ results
for model-flops

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

ROCm/TransformerEngine #78

[FSDP 8xMI300X]: LLama3 70B 4 Layer Proxy Model GPU Core Dum…

### Problem Description On Llama3 70B Proxy Model, the training stalls & gpucore dumps. The gpucore dumps are 41GByte per GPU thus i am unable to send it. Probably easier for yall to reprod this er…

OrenLeung updated 2 weeks ago
24
pytorch/pytorch #140909

[2.5.0+] Running a compiled model under FlopCounterMode regr…

### 🐛 Describe the bug Invoking a compiled model under FlopCounterMode context results in a slower compiled model. If we run our benchmark _before_ the model is instrumented with FlopCounterMode, …

Birch-san updated 1 week ago
2
google-ai-edge/LiteRT #66

TFLiteGPUDelegate : FirstNLargestPartitions : It would not b…

### Issue type Performance ### Have you reproduced the bug with TensorFlow Nightly? No ### Source source ### TensorFlow version tf 2.4.1 ### Custom code Yes ### OS platfo…

gaikwadrahul8 updated 5 days ago
2
QwenLM/Qwen2-VL #206

video attention calculation question.

In modeling_qwen2_vl.py https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py#L343 The attention_mask is set for each frame, when not set the f…

Edwardmark updated 2 months ago
1
usnistgov/image-classification-resnet50 #3

Model Flops Discrepancy

Hi I have been running some tests and your model reports a FLOPS of around 4 G The original paper and the keras implementation reports 3.8 G Any idea why the difference?

Woodyet updated 1 year ago
1
MrYxJ/calculate-flops.pytorch #36

Why are `torch.mm` and `torch.bmm` deleted in pytorch_ops.py

When I use `calculate_flops` to calculate flops of a local model (e.g. `openai/clip-vit-large-patch14-336` downloaded locally), the result is smaller than the FLOPs calculated manually (use the flops …

wzk1015 updated 2 months ago
2
open-mmlab/mmsegmentation #3529

get_flops.py can not be used by pointrend

I have a trained pointrend model implemented by MMseg, but when use get_flops.py to calculate FLOPS, it have the following some error. ![image](https://github.com/open-mmlab/mmsegmentation/assets/761…

HuangWBill updated 3 weeks ago
2
ROCm/TransformerEngine #76

[DDP 8xMI300X] GPT2-1.5B FP8 is 25% slower than BF16 & OOMs …

### Problem Description Even with `NVTE_USE_HIPBLASLT=1` & Installing TE while inside the container instead of through `Dockerfile` as suggested by https://github.com/ROCm/TransformerEngine/issues/…

OrenLeung updated 1 month ago
3
facebookresearch/lingua #56

act checkpointing OOM, float8 causes CUDA memory allocation …

I am trying to train LLama-7B on 8xH100-80GB (HBM3), ### Baseline When running _without_ activation checkpointing and _without_ fp8, everything runs smoothly: ```yaml distributed: fsdp_type:…

Niccolo-Ajroldi updated 15 hours ago
14
ROCm/TransformerEngine #79

[FSDP 8xMI300X] Llama3 8B FP8 is 21% slower than BF16 & OOMs…

### Problem Description Llama3 8B FP8 OOMs at the same batch size as BF16. I need to decrease the batch size to `2` for it to not OOM. At batch size 2, TE FP8 is **21% slower** than torch compile B…

OrenLeung updated 3 weeks ago
6

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for model-flops

1000+ results
for model-flops