tensor-train Search Results

1000+ results
for tensor-train

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

WoodwindHu/RangeLDM #3

Correct vae_checkpoint to run the ldm nuScenes training

The vae_checkpoint you provided is diffusion_pytorch_model.safetensors, not a .ckpt or .pth file. But in ldm/train_unconditional.py, vae_checkpoint = torch.load(args.vae_checkpoint, map_location='cpu'…

JackieHuJunyi updated 2 weeks ago
1
Ucas-HaoranWei/GOT-OCR2.0 #135

训练stage-3，推理时出错：ValueError: Trying to set a tensor of shape …

如题，训练stage-3，训练时正常，推理出现错误： Traceback (most recent call last): File "/root/GOT-OCR2.0/GOT-OCR-2.0-master/GOT/demo/run_ocr_2.0.py", line 245, in eval_model(args) File "/root/GOT-OCR2.0/…

yangy272 updated 1 month ago
6
Maelic/SGG-Benchmark #36

Error during training validation: AttributeError: 'NoneType'…

**Dear author, thank you very much for your excellent work on this project. When I train my own SGDet model, I encounter two errors during the validation phase. No.1 is as follows:** `Traceback (m…

Young-Loser updated 1 month ago
1
ZhangXInFD/SpeechTokenizer #22

Question about size when training

RuntimeError: Caught RuntimeError in DataLoader worker process 2. Original Traceback (most recent call last): File "/wiz-llm-storage/anaconda3/envs/h2_nemo/lib/python3.10/site-packages/torch/utils…

LiuMY13 updated 1 week ago
1
huggingface/trl #1913

DataCollatorForCompletionOnlyLM not working with MPS

Hi there, amazing work :) I just encountered an error while trying to run the library on an Apple M3 Max. Below is a MWE to reproduce the error. The example itself doesn't make sense but at leas…

giuliabaldini updated 1 month ago
5
ostris/ai-toolkit #219

The parameter type of batch in TrainSliderProcess.py does no…

## This is for bugs only Did you already ask [in the discord](https://discord.gg/VXmU2f5WEU)? No You verified that this is a bug and not a feature request or question by asking [in the discor…

Ando-Lin updated 1 week ago
1
NVIDIA/Megatron-LM #1125

[QUESTION] tensor_parallel.broadcast_data and train_valid_te…

In my understanding, in pretrain code, it broadcasts the data from tp rank 0 to the rest tp rank gpus. However, if i activate the option `train_valid_test_datasets_provider.is_distributed = True` wh…

KookHoiKim updated 3 weeks ago
1
CyberAgentAILab/SuperNormal #10

Training error on Custom dataset

Hi, I'm using XHumans Dataset which provide mesh. I rendered normal with camera poses and got error as following while training the model. Any insight would be helpful. ``` true_normal torch.Siz…

gwang-kim updated 2 months ago
1
huggingface/transformers #34695

RuntimeError: Expected all tensors to be on the same device,…

**Reproduction** I am trying to finetune Qwen2-0.5B model on some training data using a multi-GPU setup. The same code (given further below) seems to work in a single-GPU setting (when i set CUDA_V…

ra-MANUJ-an updated 1 week ago
5
Lightning-AI/litdata #408

training hangs with lightning ddp and cloud dir?

## 🐛 Bug Hi, we are using lightning with litdata on our local machine and aws s3 system. However, training would hang randomly during the very first iterations with ddp and remote cloud directory. …

rxqy updated 1 week ago
8

上一页 1...8 9 10 11 12 13 14...100 下一页

1000+ results for tensor-train

1000+ results
for tensor-train