LambdaLabsML / examples

Deep Learning Examples
MIT License
805 stars 103 forks source link

Invalid --gpus argument #27

Open yu-rp opened 1 year ago

yu-rp commented 1 year ago

Dear author,

I am running the pokemon_finetune.ipynb with the following setting.

# 2xA6000:
BATCH_SIZE = 4
N_GPUS = 1
ACCUMULATE_BATCHES = 1

gpu_list = ",".join((str(x) for x in range(N_GPUS))) + ","
print(f"Using GPUs: {gpu_list}")

I run the python main.py code block

# Run training
!(python main.py \
    -t \
    --base configs/stable-diffusion/pokemon.yaml \
    --gpus "$gpu_list" \
    --scale_lr False \
    --num_nodes 1 \
    --check_val_every_n_epoch 10 \
    --finetune_from "$ckpt_path" \
    data.params.batch_size="$BATCH_SIZE" \
    lightning.trainer.accumulate_grad_batches="$ACCUMULATE_BATCHES" \
    data.params.validation.params.n_gpus="$NUM_GPUS" \
)

I got an error saying that

main.py: error: argument --gpus: invalid _gpus_allowed_type value: ''

Could you please let me know why?

devonbrackbill commented 1 year ago

You're running with N_GPUS = 1, which creates the string gpu_list='0,', but you want it to be gpu_list='0' (without the trailing comma). You can replace the final two lines of the settings with:

if N_GPUS > 1:
  gpu_list = ",".join((str(x) for x in range(N_GPUS))) + ","
else:
  gpu_list = "0"
print(f"Using GPUs: {gpu_list}")
mingyao743 commented 1 year ago

maybe its code mistake, N_GPUS instead of NUM_GPUS

Terkwood commented 1 year ago

yes, there's a mistake there as pointed out by @mingyao743 . i will raise a PR if there's not one already

Terkwood commented 1 year ago

Raised #34 to resolve this. This change let me progress on my notebook

yu-rp commented 1 year ago

Thank you all. This also works for me. May I ask whether I shall close this issue?

Raised #34 to resolve this. This change let me progress on my notebook

MesutUnutur commented 1 year ago

Hello dear, I got the same error saying that main.py: error: argument --gpus: invalid _gpus_allowed_type value: '' now i am using this code block on runpod. i use cloud gpu. How can i fix the same error when i was using cloud cpu? Thanks for answering

megatran commented 1 year ago

Hopefully the author will look at the pending PR, but here's a potential fix https://github.com/LambdaLabsML/examples/pull/65