CUDA runtime not found despite correct CUDA installation

magicwang1111 commented 1 month ago

I'm encountering an issue where my CUDA environment is correctly installed and nvcc --version confirms that CUDA 12.4 is available, but my training script fails with the following message:

No CUDA runtime is found, using CUDA_HOME='/mnt/data/wangxi/cuda-12.4/'

Environment Details:

CUDA Version: 12.4 (Verified using nvcc --version)

Script/Command: CUDA_VISIBLE_DEVICES=0 sh train.sh Error Log: VALIDATION_NEGATIVE_PROMPT not set, defaulting to empty. No CUDA runtime is found, using CUDA_HOME='/mnt/data/wangxi/cuda-12.4/'

I updated to the latest code this morning. There was no problem with CUDA detection in the previous code.

magicwang1111 commented 1 month ago

(flux) [wangxi@v100-4 SimpleTuner]$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Tue_Feb_27_16:19:38_PST_2024 Cuda compilation tools, release 12.4, V12.4.99 Build cuda_12.4.r12.4/compiler.33961263_0 (flux) [wangxi@v100-4 SimpleTuner]$ ^C (flux) [wangxi@v100-4 SimpleTuner]$ CUDA_VISIBLE_DEVICES=0 sh train.sh find: ‘/mnt/data1/wangxi/SimpleTuner/.venv’: No such file or directory /lib VALIDATION_NEGATIVE_PROMPT not set, defaulting to empty. No CUDA runtime is found, using CUDA_HOME='/mnt/data/wangxi/cuda-12.4/'

magicwang1111 commented 1 month ago

2024-08-22 15:16:17,220 [INFO] (main) Moving the diffusion transformer to GPU in int4-quanto precision. 2024-08-22 15:16:17,398 [INFO] (main) Running training

Num batches = 5368
Num Epochs = 1
- Current Epoch = 1
Total train batch size (w. parallel, distributed & accumulation) = 1
- Instantaneous batch size per device = 1
- Gradient Accumulation steps = 1
Total optimization steps = 3000
Total optimization steps remaining = 3000 Epoch 1/1 Steps: 0%| | 0/3000 [00:00<?, ?it/s]2024-08-22 15:16:17,407 [ERROR] (collate_fn) (id=qxxy style) Error while computing latents: CUDA error: no CUDA-capable device is detected CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Device-side assertions were explicitly omitted for this error check; the error probably arose while initializing the DSA handlers. CUDA error: no CUDA-capable device is detected CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Device-side assertions were explicitly omitted for this error check; the error probably arose while initializing the DSA handlers. Traceback (most recent call last): File "/mnt/data1/wangxi/SimpleTuner/train.py", line 2526, in main() File "/mnt/data1/wangxi/SimpleTuner/train.py", line 1533, in main batch = iterator_fn(step, iterator_args) File "/mnt/data1/wangxi/SimpleTuner/helpers/data_backend/factory.py", line 1252, in random_dataloader_iterator return next(chosen_iter) File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 630, in next data = self._next_data() File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 673, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch return self.collate_fn(data) File "/mnt/data1/wangxi/SimpleTuner/helpers/data_backend/factory.py", line 875, in collate_fn=lambda examples: collate_fn(examples), File "/mnt/data1/wangxi/SimpleTuner/helpers/training/collate.py", line 410, in collate_fn latent_batch = compute_latents(filepaths, data_backend_id) File "/mnt/data1/wangxi/SimpleTuner/helpers/training/collate.py", line 146, in compute_latents latents = list( File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator yield _result_or_cancel(fs.pop()) File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel return fut.result(timeout) File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/concurrent/futures/_base.py", line 458, in result return self.get_result() File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/concurrent/futures/_base.py", line 403, in get_result raise self._exception File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(self.args, **self.kwargs) File "/mnt/data1/wangxi/SimpleTuner/helpers/training/collate.py", line 110, in fetch_latent latent = latent.to("cpu").pin_memory() RuntimeError: CUDA error: no CUDA-capable device is detected CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Device-side assertions were explicitly omitted for this error check; the error probably arose while initializing the DSA handle

magicwang1111 commented 1 month ago

(flux) [wangxi@v100-4 SimpleTuner]$ python Python 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import torch print(torch.cuda.is_available()) True print(torch.cuda.device_count()) 8 print(torch.cuda.current_device()) 0 print(torch.cuda.get_device_name(torch.cuda.current_device())) Tesla V100-SXM2-32GB

bghira commented 1 month ago

not sure a v100 works. might be too old

bghira commented 1 month ago

for example there's a warning earlier that waits 10 seconds to continue before using int4-quanto on NVIDIA devices that only A100 or H100 are supported. but here yours is using int4-quanto

you must pay attention to errors and warnings! in fact, V100 does not support bfloat16. and it cannot be used by SimpleTuner.

magicwang1111 commented 1 month ago

for example there's a warning earlier that waits 10 seconds to continue before using int4-quanto on NVIDIA devices that only A100 or H100 are supported. but here yours is using int4-quanto

you must pay attention to errors and warnings! in fact, V100 does not support bfloat16. and it cannot be used by SimpleTuner.

The version two weeks ago could train normally, but now it doesn't

bghira commented 1 month ago

i did not have a proper check for bf16 support on devices to exclude the emulation done by nvidia, but this is corrected now

bghira commented 1 month ago

it is because the int4-quanto precision level specifically requires bf16 kernels, but the v100 does not have any. additionally, we do not use autocast or have fp16 support due to the requirement for grad scaler and the rest of the complex pieces.

magicwang1111 commented 1 month ago

it is because the int4-quanto precision level specifically requires bf16 kernels, but the v100 does not have any. additionally, we do not use autocast or have fp16 support due to the requirement for grad scaler and the rest of the complex pieces.

update to lateset code. and use int8-quanto (flux) [wangxi@v100-4 SimpleTuner]$ CUDA_VISIBLE_DEVICES=0 sh train.sh find: ‘/mnt/data1/wangxi/SimpleTuner/.venv’: No such file or directory /lib VALIDATION_NEGATIVE_PROMPT not set, defaulting to empty. No CUDA runtime is found, using CUDA_HOME='/mnt/data/wangxi/cuda-12.4/' 2024-08-23 11:35:06,484 [ERROR] (bitsandbytes.cextension) Could not load bitsandbytes native library: /home/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: cannot open shared object file: No such file or directory Traceback (most recent call last): File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 109, in lib = get_native_library() File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 96, in get_native_library dll = ct.cdll.LoadLibrary(str(binary_path)) File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/ctypes/init.py", line 452, in LoadLibrary return self._dlltype(name) File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/ctypes/init.py", line 374, in init self._handle = _dlopen(self._name, mode) OSError: /home/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: cannot open shared object file: No such file or directory optimizer: {'precision': 'bf16', 'default_settings': {'betas': (0.9, 0.999), 'weight_decay': 0.01, 'eps': 1e-06}, 'class': <class 'helpers.training.adam_bfloat16.AdamWBF16'>} 2024-08-23 11:35:06,606 [WARNING] (ArgsParser) The VAE model madebyollin/sdxl-vae-fp16-fix is not compatible. Please use a compatible VAE to eliminate this warning. The baked-in VAE will be used, instead. 2024-08-23 11:35:06,606 [INFO] (ArgsParser) VAE Model: black-forest-labs/FLUX.1-dev 2024-08-23 11:35:06,606 [INFO] (ArgsParser) Default VAE Cache location: 2024-08-23 11:35:06,606 [INFO] (ArgsParser) Text Cache location: cache 2024-08-23 11:35:06,607 [WARNING] (ArgsParser) Updating T5 XXL tokeniser max length to 512 for Flux. 2024-08-23 11:35:06,631 [WARNING] (accelerate.utils.other) Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. 2024-08-23 11:35:10,567 [INFO] (main) Logged into Hugging Face Hub as 'wang1111magic' 2024-08-23 11:35:10,567 [INFO] (main) Load tokenizers You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers 2024-08-23 11:35:12,139 [INFO] (helpers.training.text_encoding) Loading OpenAI CLIP-L text encoder from black-forest-labs/FLUX.1-dev/text_encoder.. 2024-08-23 11:35:12,624 [INFO] (helpers.training.text_encoding) Loading T5 XXL v1.1 text encoder from black-forest-labs/FLUX.1-dev/text_encoder_2.. Downloading shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 10591.68it/s] Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 3.96it/s] 2024-08-23 11:35:18,456 [INFO] (main) Load VAE: black-forest-labs/FLUX.1-dev 2024-08-23 11:35:19,040 [INFO] (main) Moving text encoder to GPU. 2024-08-23 11:35:19,042 [INFO] (main) Moving text encoder 2 to GPU. 2024-08-23 11:35:19,047 [INFO] (main) Loading VAE onto accelerator, converting from torch.float32 to torch.bfloat16 2024-08-23 11:35:19,066 [INFO] (DataBackendFactory) Loading data backend config from config/multidatabackend.json 2024-08-23 11:35:19,066 [INFO] (DataBackendFactory) Configuring text embed backend: dataset_captions Loading pipeline components...: 0%| | 0/5 [00:00<?, ?it/s]Loaded scheduler as FlowMatchEulerDiscreteScheduler from `scheduler` subfolder of black-forest-labs/FLUX.1-dev. Loading pipeline components...: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 628.45it/s] 2024-08-23 11:35:19,546 [INFO] (TextEmbeddingCache) (Rank: 0) (id=dataset_captions) Listing all text embed cache entries 2024-08-23 11:35:19,555 [INFO] (DataBackendFactory) Pre-computing null embedding 2024-08-23 11:35:24,559 [INFO] (DataBackendFactory) Completed loading text embed services. 2024-08-23 11:35:24,560 [INFO] (DataBackendFactory) Configuring data backend: qxxy style 2024-08-23 11:35:24,560 [INFO] (DataBackendFactory) (id=qxxy style) Loading bucket manager. 2024-08-23 11:35:24,563 [INFO] (JsonMetadataBackend) Checking for cache file: /mnt/data1/dataset/20240820_Your_name/aspect_ratio_bucket_indices_square.json 2024-08-23 11:35:24,563 [INFO] (JsonMetadataBackend) Pulling cache file from storage 2024-08-23 11:35:24,563 [INFO] (DataBackendFactory) (id=qxxy style) Refreshing aspect buckets on main process. 2024-08-23 11:35:24,563 [INFO] (BaseMetadataBackend) Discovering new files... 2024-08-23 11:35:24,607 [INFO] (BaseMetadataBackend) Compressed 488 existing files from 1. 2024-08-23 11:35:24,608 [INFO] (BaseMetadataBackend) No new files discovered. Doing nothing. 2024-08-23 11:35:24,608 [INFO] (BaseMetadataBackend) Statistics: {'total_processed': 0, 'skipped': {'already_exists': 488, 'metadata_missing': 0, 'not_found': 0, 'too_small': 0, 'other': 0}} 2024-08-23 11:35:24,608 [WARNING] (DataBackendFactory) Key config_version not found in the current backend config, using the existing value '2'. 2024-08-23 11:35:24,608 [WARNING] (DataBackendFactory) Key hash_filenames not found in the current backend config, using the existing value 'True'. 2024-08-23 11:35:24,608 [INFO] (DataBackendFactory) Configured backend: {'id': 'qxxy style', 'config': {'probability': 1.0, 'repeats': 10, 'crop': False, 'crop_aspect': 'square', 'crop_aspect_buckets': None, 'crop_style': 'center', 'disable_validation': False, 'resolution': 1024, 'resolution_type': 'pixel', 'caption_strategy': 'textfile', 'instance_data_dir': '/mnt/data1/dataset/20240820_Your_name/', 'maximum_image_size': None, 'target_downsample_size': None, 'config_version': 2, 'hash_filenames': True}, 'dataset_type': 'image', 'data_backend': <helpers.data_backend.local.LocalDataBackend object at 0x7f11fafb70d0>, 'instance_data_dir': '/mnt/data1/dataset/20240820_Your_name', 'metadata_backend': <helpers.metadata.backends.json.JsonMetadataBackend object at 0x7f11fafb6b00>} (Rank: 0) | Bucket | Image Count (per-GPU)

(Rank: 0) | 1.75 | 488 2024-08-23 11:35:24,609 [INFO] (DataBackendFactory) (id=qxxy style) Collecting captions. 2024-08-23 11:35:24,622 [INFO] (DataBackendFactory) (id=qxxy style) Initialise text embed pre-computation using the textfile caption strategy. We have 488 captions to process. 2024-08-23 11:35:24,635 [INFO] (DataBackendFactory) (id=qxxy style) Completed processing 488 captions. 2024-08-23 11:35:24,635 [INFO] (DataBackendFactory) (id=qxxy style) Creating VAE latent cache. 2024-08-23 11:35:24,637 [INFO] (DataBackendFactory) (id=qxxy style) Discovering cache objects.. 2024-08-23 11:35:24,655 [INFO] (DataBackendFactory) Configured backend: {'id': 'qxxy style', 'config': {'probability': 1.0, 'repeats': 10, 'crop': False, 'crop_aspect': 'square', 'crop_aspect_buckets': None, 'crop_style': 'center', 'disable_validation': False, 'resolution': 1024, 'resolution_type': 'pixel', 'caption_strategy': 'textfile', 'instance_data_dir': '/mnt/data1/dataset/20240820_Your_name/', 'maximum_image_size': None, 'target_downsample_size': None, 'config_version': 2, 'hash_filenames': True}, 'dataset_type': 'image', 'data_backend': <helpers.data_backend.local.LocalDataBackend object at 0x7f11fafb70d0>, 'instance_data_dir': '/mnt/data1/dataset/20240820_Your_name', 'metadata_backend': <helpers.metadata.backends.json.JsonMetadataBackend object at 0x7f11fafb6b00>, 'train_dataset': <helpers.multiaspect.dataset.MultiAspectDataset object at 0x7f11fafb6500>, 'sampler': <helpers.multiaspect.sampler.MultiAspectSampler object at 0x7f1200de06a0>, 'train_dataloader': <torch.utils.data.dataloader.DataLoader object at 0x7f1200de05e0>, 'text_embed_cache': <helpers.caching.text_embeds.Text

bghira commented 1 month ago

i think it doesn't even support cuda 12.4?

magicwang1111 commented 1 month ago

i think it doesn't even support cuda 12.4?

aitookit can detect cuda 12.4

magicwang1111 commented 1 month ago

(flux) [wangxi@v100-4 SimpleTuner]$ echo $CUDA_HOME /mnt/data/wangxi/cuda-12.4/ (flux) [wangxi@v100-4 SimpleTuner]$ echo $PATH /mnt/data/wangxi/cmake/cmake/bin:/mnt/data/wangxi/cuda-12.4/bin:/home/wangxi/temp/gcc_11.3.0/bin:/mnt/data/wangxi/cmake/cmake/bin:/mnt/data/wangxi/cuda-12.4/bin:/home/wangxi/temp/gcc_11.3.0/bin:/home/wangxi/miniconda3/envs/flux/bin:/home/wangxi/miniconda3/condabin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/mnt/data/shared_data/gurobi/gurobi1102/linux64/bin:/home/wangxi/.local/bin:/home/wangxi/bin:/mnt/data/shared_data/gurobi/gurobi1102/linux64/bin:/home/wangxi/.local/bin:/home/wangxi/bin (flux) [wangxi@v100-4 SimpleTuner]$ echo $LD_LIBRARY_PATH /mnt/data/wangxi/cuda-12.4//lib64:/home/wangxi/temp/gcc_11.3.0/lib64:/mnt/data/wangxi/cuda-12.4//lib64:/home/wangxi/temp/gcc_11.3.0/lib64::/mnt/data/shared_data/gurobi/gurobi1102/linux64/lib:/mnt/data/wangxi/cuda-12.4/targets/x86_64-linux/lib:/mnt/data/shared_data/gurobi/gurobi1102/linux64/lib:/mnt/data/wangxi/cuda-12.4/targets/x86_64-linux/lib (flux) [wangxi@v100-4 SimpleTuner]$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Tue_Feb_27_16:19:38_PST_2024 Cuda compilation tools, release 12.4, V12.4.99 Build cuda_12.4.r12.4/compiler.33961263_0 (flux) [wangxi@v100-4 SimpleTuner]$

magicwang1111 commented 1 month ago

(flux) [wangxi@v100-4 SimpleTuner]$ python Python 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import torch print(torch.version.cuda) 12.4 print(torch.cuda.is_available()) True

(flux) [wangxi@v100-4 SimpleTuner]$

magicwang1111 commented 1 month ago

i think it doesn't even support cuda 12.4?

(flux) [wangxi@v100-4 SimpleTuner]$ CUDA_VISIBLE_DEVICES=6 python train.py 2024-08-23 14:56:10,043 [WARNING] (bitsandbytes.cextension) WARNING: BNB_CUDA_VERSION=124 environment variable detected; loading libbitsandbytes_cuda124_nocublaslt124.so. This can be used to load a bitsandbytes version that is different from the PyTorch CUDA version. If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION= If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64

usage: train.py [-h] [--snr_gamma SNR_GAMMA] [--use_soft_min_snr] [--soft_min_snr_sigma_data SOFT_MIN_SNR_SIGMA_DATA] [--model_type {full,lora,deepfloyd-full,deepfloyd-lora,deepfloyd-stage2,deepfloyd-stage2-lora}] [--legacy] [--kolors] [--flux] [--flux_lora_target {mmdit,context,context+ffs,all,all+ffs,ai-toolkit}] [--flow_matching_sigmoid_scale FLOW_MATCHING_SIGMOID_SCALE] [--flux_fast_schedule] [--flux_guidance_mode {constant,random-range}] [--flux_guidance_value FLUX_GUIDANCE_VALUE] [--flux_guidance_min FLUX_GUIDANCE_MIN] [--flux_guidance_max FLUX_GUIDANCE_MAX] [--flux_attention_masked_training] [--smoldit] [--smoldit_config {smoldit-small,smoldit-swiglu,smoldit-base,smoldit-large,smoldit-huge}] [--flow_matching_loss {diffusers,compatible,diffusion}] [--pixart_sigma] [--sd3] [--sd3_t5_mask_behaviour {do-nothing,mask}] [--lora_type {standard,lycoris}] [--lora_init_type {default,gaussian,loftq,olora,pissa}] [--init_lora INIT_LORA] [--lora_rank LORA_RANK] [--lora_alpha LORA_ALPHA] [--lora_dropout LORA_DROPOUT] [--lycoris_config LYCORIS_CONFIG] [--controlnet] [--controlnet_model_name_or_path] --pretrained_model_name_or_path PRETRAINED_MODEL_NAME_OR_PATH [--pretrained_transformer_model_name_or_path PRETRAINED_TRANSFORMER_MODEL_NAME_OR_PATH] [--pretrained_transformer_subfolder PRETRAINED_TRANSFORMER_SUBFOLDER] [--pretrained_unet_model_name_or_path PRETRAINED_UNET_MODEL_NAME_OR_PATH] [--pretrained_unet_subfolder PRETRAINED_UNET_SUBFOLDER] [--pretrained_vae_model_name_or_path PRETRAINED_VAE_MODEL_NAME_OR_PATH] [--pretrained_t5_model_name_or_path PRETRAINED_T5_MODEL_NAME_OR_PATH] [--prediction_type {epsilon,v_prediction,sample}] [--snr_weight SNR_WEIGHT] [--training_scheduler_timestep_spacing {leading,linspace,trailing}] [--inference_scheduler_timestep_spacing {leading,linspace,trailing}] [--refiner_training] [--refiner_training_invert_schedule] [--refiner_training_strength REFINER_TRAINING_STRENGTH] [--timestep_bias_strategy {earlier,later,range,none}] [--timestep_bias_multiplier TIMESTEP_BIAS_MULTIPLIER] [--timestep_bias_begin TIMESTEP_BIAS_BEGIN] [--timestep_bias_end TIMESTEP_BIAS_END] [--timestep_bias_portion TIMESTEP_BIAS_PORTION] [--disable_segmented_timestep_sampling] [--rescale_betas_zero_snr] [--vae_dtype {default,fp16,fp32,bf16}] [--vae_batch_size VAE_BATCH_SIZE] [--vae_cache_scan_behaviour {recreate,sync}] [--vae_cache_preprocess] [--vae_cache_ondemand] [--compress_disk_cache] [--aspect_bucket_disable_rebuild] [--keep_vae_loaded] [--skip_file_discovery SKIP_FILE_DISCOVERY] [--revision REVISION] [--variant VARIANT] [--preserve_data_backend_cache] [--use_dora] [--override_dataset_config] [--cache_dir_text CACHE_DIR_TEXT] [--cache_dir_vae CACHE_DIR_VAE] --data_backend_config DATA_BACKEND_CONFIG [--data_backend_sampling {uniform,auto-weighting}] [--write_batch_size WRITE_BATCH_SIZE] [--read_batch_size READ_BATCH_SIZE] [--image_processing_batch_size IMAGE_PROCESSING_BATCH_SIZE] [--enable_multiprocessing] [--max_workers MAX_WORKERS] [--aws_max_pool_connections AWS_MAX_POOL_CONNECTIONS] [--torch_num_threads TORCH_NUM_THREADS] [--dataloader_prefetch] [--dataloader_prefetch_qlen DATALOADER_PREFETCH_QLEN] [--aspect_bucket_worker_count ASPECT_BUCKET_WORKER_COUNT] [--cache_dir CACHE_DIR] [--cache_clear_validation_prompts] [--caption_strategy {filename,textfile,instance_prompt,parquet}] [--parquet_caption_column PARQUET_CAPTION_COLUMN] [--parquet_filename_column PARQUET_FILENAME_COLUMN] [--instance_prompt INSTANCE_PROMPT] [--output_dir OUTPUT_DIR] [--seed SEED] [--seed_for_each_device SEED_FOR_EACH_DEVICE] [--resolution RESOLUTION] [--resolution_type {pixel,area,pixel_area}] [--aspect_bucket_rounding {1,2,3,4,5,6,7,8,9}] [--aspect_bucket_alignment {8,64}] [--minimum_image_size MINIMUM_IMAGE_SIZE] [--maximum_image_size MAXIMUM_IMAGE_SIZE] [--target_downsample_size TARGET_DOWNSAMPLE_SIZE] [--train_text_encoder] [--tokenizer_max_length TOKENIZER_MAX_LENGTH] [--train_batch_size TRAIN_BATCH_SIZE] [--num_train_epochs NUM_TRAIN_EPOCHS] [--max_train_steps MAX_TRAIN_STEPS] [--checkpointing_steps CHECKPOINTING_STEPS] [--checkpoints_total_limit CHECKPOINTS_TOTAL_LIMIT] [--resume_from_checkpoint RESUME_FROM_CHECKPOINT] [--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS] [--gradient_checkpointing] [--learning_rate LEARNING_RATE] [--text_encoder_lr TEXT_ENCODER_LR] [--lr_scale] [--lr_scheduler {linear,sine,cosine,cosine_with_restarts,polynomial,constant,constant_with_warmup}] [--lr_warmup_steps LR_WARMUP_STEPS] [--lr_num_cycles LR_NUM_CYCLES] [--lr_power LR_POWER] [--use_ema] [--ema_device {cpu,accelerator}] [--ema_cpu_only] [--ema_foreach_disable] [--ema_update_interval EMA_UPDATE_INTERVAL] [--ema_decay EMA_DECAY] [--non_ema_revision NON_EMA_REVISION] [--offload_param_path OFFLOAD_PARAM_PATH] --optimizer {adamw_bf16,optimi-stableadamw,optimi-adamw,optimi-lion,optimi-radam,optimi-ranger,optimi-adan,optimi-adam,optimi-sgd} [--optimizer_config OPTIMIZER_CONFIG] [--optimizer_beta1 OPTIMIZER_BETA1] [--optimizer_beta2 OPTIMIZER_BETA2] [--optimizer_release_gradients] [--use_8bit_adam] [--use_adafactor_optimizer] [--use_prodigy_optimizer] [--use_dadapt_optimizer] [--adam_beta1 ADAM_BETA1] [--adam_beta2 ADAM_BETA2] [--adam_weight_decay ADAM_WEIGHT_DECAY] [--adam_epsilon ADAM_EPSILON] [--adam_bfloat16] [--max_grad_norm MAX_GRAD_NORM] [--push_to_hub] [--push_checkpoints_to_hub] [--hub_model_id HUB_MODEL_ID] [--model_card_note MODEL_CARD_NOTE] [--logging_dir LOGGING_DIR] [--validation_on_startup] [--validation_seed_source {gpu,cpu}] [--validation_torch_compile VALIDATION_TORCH_COMPILE] [--validation_torch_compile_mode {max-autotune,reduce-overhead,default}] [--allow_tf32] [--validation_using_datasets] [--webhook_config WEBHOOK_CONFIG] [--report_to REPORT_TO] [--tracker_run_name TRACKER_RUN_NAME] [--tracker_project_name TRACKER_PROJECT_NAME] [--validation_prompt VALIDATION_PROMPT] [--validation_prompt_library] [--user_prompt_library USER_PROMPT_LIBRARY] [--validation_negative_prompt VALIDATION_NEGATIVE_PROMPT] [--num_validation_images NUM_VALIDATION_IMAGES] [--validation_steps VALIDATION_STEPS] [--num_eval_images NUM_EVAL_IMAGES] [--eval_dataset_id EVAL_DATASET_ID] [--validation_num_inference_steps VALIDATION_NUM_INFERENCE_STEPS] [--validation_resolution VALIDATION_RESOLUTION] [--validation_noise_scheduler {ddim,ddpm,euler,euler-a,unipc}] [--validation_disable_unconditional] [--disable_compel] [--enable_watermark] [--mixed_precision {bf16,no}] [--gradient_precision {unmodified,fp32}] [--base_model_precision {no_change,fp8-quanto,int8-quanto,int4-quanto,int2-quanto}] [--base_model_default_dtype {bf16,fp32}] [--text_encoder_1_precision {no_change,fp8-quanto,int8-quanto,int4-quanto,int2-quanto}] [--text_encoder_2_precision {no_change,fp8-quanto,int8-quanto,int4-quanto,int2-quanto}] [--text_encoder_3_precision {no_change,fp8-quanto,int8-quanto,int4-quanto,int2-quanto}] [--local_rank LOCAL_RANK] [--enable_xformers_memory_efficient_attention] [--set_grads_to_none] [--noise_offset NOISE_OFFSET] [--noise_offset_probability NOISE_OFFSET_PROBABILITY] [--validation_guidance VALIDATION_GUIDANCE] [--validation_guidance_real VALIDATION_GUIDANCE_REAL] [--validation_no_cfg_until_timestep VALIDATION_NO_CFG_UNTIL_TIMESTEP] [--validation_guidance_rescale VALIDATION_GUIDANCE_RESCALE] [--validation_randomize] [--validation_seed VALIDATION_SEED] [--fully_unload_text_encoder] [--freeze_encoder_before FREEZE_ENCODER_BEFORE] [--freeze_encoder_after FREEZE_ENCODER_AFTER] [--freeze_encoder_strategy FREEZE_ENCODER_STRATEGY] [--layer_freeze_strategy {none,bitfit}] [--unet_attention_slice] [--print_filenames] [--print_sampler_statistics] [--metadata_update_interval METADATA_UPDATE_INTERVAL] [--debug_aspect_buckets] [--debug_dataset_loader] [--freeze_encoder FREEZE_ENCODER] [--save_text_encoder] [--text_encoder_limit TEXT_ENCODER_LIMIT] [--prepend_instance_prompt] [--only_instance_prompt] [--data_aesthetic_score DATA_AESTHETIC_SCORE] [--sdxl_refiner_uses_full_range] [--caption_dropout_probability CAPTION_DROPOUT_PROBABILITY] [--delete_unwanted_images] [--delete_problematic_images] [--offset_noise] [--input_perturbation INPUT_PERTURBATION] [--input_perturbation_steps INPUT_PERTURBATION_STEPS] [--lr_end LR_END] [--i_know_what_i_am_doing] [--accelerator_cache_clear_interval ACCELERATOR_CACHE_CLEAR_INTERVAL] train.py: error: the following arguments are required: --pretrained_model_name_or_path, --data_backend_config, --optimizer (flux) [wangxi@v100-4 SimpleTuner]$

magicwang1111 commented 1 month ago

i think it doesn't even support cuda 12.4?

i think train.sh maybe something wrong

bghira commented 1 month ago

i guess just use ai-toolkit

magicwang1111 commented 1 month ago

i guess just use ai-toolkit

but when i use python train.py it can detect cuda 12.4.must something wrong with train.sh

bghira / SimpleTuner

CUDA runtime not found despite correct CUDA installation #845