-
### Search before asking
- [X] I searched the [issues](https://github.com/ray-project/kuberay/issues) and found no similar issues.
### KubeRay Component
ray-operator
### What happened + …
-
Running
```
mace_run_train --foundation_model='small' --ema_decay=0.995 --energy_weight=1.0 --forces_weight=1.0 --stress_weight=1.0 --max_num_epochs=2 --scheduler_patience=5 --patience=40 --clip_gra…
-
## Detailed description
I processed the Gait3D dataset as [Data Pretreatment](https://github.com/ShiqiYu/OpenGait/blob/master/datasets/Gait3D/README.md) and set "gait3d-merged-pkl" to dataset_root, w…
-
Hi!
Thank you for your work on this project!
I'm training the s-model on a custom dataset, and I’ve encountered an issue after several successful epochs. Up until the 12th epoch, training seems to…
-
I am fine tuning a Longformer Encoder Decoder model for multi document text summarization. When I try to run through the forward pass, it gives me an error "index out of range in self". The input shap…
-
### Describe the bug
It seems this long standing issue (see #52, #132, #159, and #601) is still unresolved.
I have configured a daily job running at 3:00, which processes a batch of entities rel…
-
### Describe the bug
I was trying to test different schedulers under DDPMPipeline. And an error occurred if I use PNDMScheduler beforehand I have found that PNDMScheduler should be compatible with DD…
-
Kicking off the idea/convo of including an API within Backburner for optimally batching work within a rAF scheduler for reads/writes. Addons obviously exist to leverage this, however it's so fundament…
-
**Reproduction**
I am trying to finetune Qwen2-0.5B model on some training data using a multi-GPU setup. The same code (given further below) seems to work in a single-GPU setting (when i set CUDA_V…
-
import logging
import os
import json
import torch
from datasets import load_from_disk
from transformers import TrainingArguments
from trl import SFTTrainer
from unsloth import FastLanguageModel…