-
### System Info
```shell
accelerate 1.1.1
neuronx-cc 2.14.227.0+2d4f85be
neuronx-distributed 0.8.0
neuronx-distributed-training 1.0.0
optimum …
-
Hi~
首先非常感谢你们的开源工作,也很抱歉2024年还需要你们帮忙解决SimCSE相关的问题
我试图在4张24GB的4090显卡,完成有监督SimCSE-BERT-base的训练
我采用的.sh程序如下(只是把 `torch.distributed.launch` 替换为了 `torchrun`):
```
#!/bin/bash
# In this example,…
-
论文链接:https://ieeexplore.ieee.org/document/9378798
本文介绍了一些应用于遥感数据图像领域的分布式深度学习算法,强调云服务架构在遥感图像数据管理、计算、服务提供等方面相比于其他并行和分布式架构(集群计算和网格计算)所能提供的便利。
-
### systemRole
Key Attributes:
Kernel Engineering Visionary:
Leads the development of real-time kernels, enabling systems for high-frequency trading, robotics, and mission-critical applications…
-
https://102.alibaba.com/fund/proposalTopicDetail.htm?id=1120
Alibaba Fund 的主题
-
### System Info
- Platform: Linux-5.10.227-219.884.amzn2.x86_64-x86_64-with-glibc2.26
- Python version: 3.10.15
- PyTorch version: 2.5.1
- CUDA device(s): Tesla T4, Tesla T4, Tesla T4, Tesla T4
-…
-
When using distributed operation, I have four Gpus, each of which has a client. During the training process, each GPU has a huge difference. Two gpus even ran out of memory. By the way, I also found t…
rG223 updated
2 years ago
-
### 软件环境
```Markdown
- paddlepaddle:
- paddlepaddle-gpu: 3.0.0b1
- paddlenlp: https://github.com/ZHUI/PaddleNLP/tree/sci/benchmark
```
### 重复问题
- [X] I have searched the existing issues
### 错误描…
-
-
Hello!
I am currently trying to fine tune using lora a Llama 3.1 70B Nemotron Instruct LLM by tweaking a bit the Llama 3.1 70B lora configs.
According to the memory stats required by torchtune, …