distributed-training Search Results

1000+ results
for distributed-training

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

valeoai/rangevit #3

An error occurs when running the code

Hi, there When I try to run your codes on the Semantic-KITTI dataset, I met an error as follows ``` Reusing positional embeddings. Traceback (most recent call last): File "main.py", line 334,…

ideasplus updated 20 hours ago
9
huggingface/transformers #31028

`SeamlessM4Tv2ConformerEncoder` does not behaves as expected…

### System Info - `transformers` version: 4.42.0.dev0 - Platform: Linux-5.4.0-172-generic-x86_64-with-glibc2.17 - Python version: 3.8.19 - Huggingface_hub version: 0.23.1 - Safetensors version:…

anferico updated 3 weeks ago
4
KhrulkovV/tt-pytorch #6

High GPU memory consumption

Hi, I tried to integrate the TTLayer into transformerXL, however I found that it consumes much more memory than usual. Did you experience such problems? do you know anyway around this? (BTW I a…

saareliad updated 4 years ago
4
automl/HpBandSter #56

Saving the state?

I am surveying different packages for hyperparameter optimization, and HpBandSter seems promising, especially becaues of its support for distributed training. But one thing I haven't had a clue is how…

netheril96 updated 5 years ago
1
karpathy/llm.c #138

The loss doesn't seem to converge after 1000 iterations

I'm working on the C version of the code in preparation for (#40) So llm.c with **no** code modifications I observe the following: - `test_gpt2` works successfully and the loss matches - `train_g…

Yiltan updated 2 months ago
4
lifeomic/sparkflow #23

Keras support?

Hi Devs, I really liked the clean API of sparkflow for distributed training. Is it possible to run keras code using sparkflow?

nirmalsinghania2008 updated 5 years ago
5
salesforce/LAVIS #396

how to implement it on a slurm cluster

I try to pretrain blip2 on a slurm cluster, but it seems that the current programme does not support distributed training on slurm by default. Any advice on it? | distributed init (rank 0, world 1)…

zhaozh10 updated 11 months ago
2
huggingface/optimum-neuron #283

LLaMa 2 training with examples/tutorials

LLaMa 2 is highly requested by customers. Can we ensure we have LLaMa 2 fine-tuning working with neuronx-distributed including sample code and tutorials for the 7B, 13B and 70B models ?

mmcclean-aws updated 2 weeks ago
6
WongKinYiu/yolov7 #147

Train yolov7 with --multi-scale cause cuDNN error

```bash python3 -m torch.distributed.launch \ --nproc_per_node 8 \ --master_port 9527 \ train.py \ --workers 8 \ --device 0,1,2,3,4,5,6,7 \ --syn…

YaoQ updated 1 year ago
1
ml-explore/mlx-examples #374

Distributed Processing in any way?

Hello, as you might know, I'm admiring your works (all of you guys, all the contributors) and love our community. Apart from this start, here is my simple question: Is there any plan to make it …

LeaveNhA updated 4 months ago
5

上一页 1...94 95 96 97 98 99 100...100 下一页

1000+ results for distributed-training

1000+ results
for distributed-training