Getting Error trying to start training (potential lack of support for Datacenter Tesla P40 24GB card)

unknownentity123 commented 1 year ago

Getting the following after latents are cached and training attempts to begin:

"Error no kernel image is available for execution on the device at line 167 in file D:\ai\tool\bitsandbytes\csrc\ops.cu Traceback (most recent call last): File "C:\Users\unres\miniconda3\envs\ST\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\unres\miniconda3\envs\ST\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\unres\miniconda3\envs\ST\Scripts\accelerate.exe__main__.py", line 7, in File "C:\Users\unres\miniconda3\envs\ST\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main args.func(args) File "C:\Users\unres\miniconda3\envs\ST\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command simple_launcher(args) File "C:\Users\unres\miniconda3\envs\ST\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\Users\unres\miniconda3\envs\ST\python.exe', 'scripts/trainer.py', '--attention=xformers', '--model_variant=base', '--disable_cudnn_benchmark', '--sample_step_interval=500', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--pretrained_vae_name_or_path=', '--output_dir=models/new_model', '--seed=3434554', '--resolution=512', '--train_batch_size=24', '--num_train_epochs=100', '--mixed_precision=fp16', '--use_bucketing', '--aspect_mode=dynamic', '--aspect_mode_action_preference=add', '--use_8bit_adam', '--gradient_checkpointing', '--gradient_accumulation_steps=1', '--learning_rate=3e-6', '--lr_warmup_steps=0', '--lr_scheduler=constant', '--train_text_encoder', '--concepts_list=stabletune_concept_list.json', '--num_class_images=4420', '--save_every_n_epoch=5', '--n_save_sample=1', '--sample_height=512', '--sample_width=512', '--dataset_repeats=1']' returned non-zero exit status 1."

I've only seen 3080 and 3090 mentioned on the main page. Does this repo not support the older Tesla data center 24GB cards like Automatic1111 and InvokeAI do?

devilismyfriend commented 1 year ago

you're running this just in normal windows?

unknownentity123 commented 1 year ago

Thanks for the reply....I am running on Windows 11 latest update....installed via an elevated conda prompt...ST conda env gets created and all dependencies appear to install from what the installation script provides feedback on....I then go to activate the ST conda environment in that same elevated prompt and run the command to open the StableTuner GUI. GUI opens fine and all sections appear to work without issue. The posted error above comes after starting the training. I selected the Fine Tuning Template button on the right pointed the GUI to the location for the VAE and selected the 1.5 model. Created both a single concept as well as another attempt with two concepts and a final with four concepts as this is the claim to fame using this over Dreambooth and I really want to utilize that. I can provide the entire prompt output once I get back to the house this evening from work.

Echolink50 commented 1 year ago

Can't help with you issue but was wondering about your P40 setup? Indeed a lot about M40 and K80 but very little on P40. What kind of computer are you running it in and was it difficult to make work? Thanks

unknownentity123 commented 1 year ago

No problem @Echolink50....I got a nice deal on the P40 on Ebay....I was running two 1070TI each with 8GB VRAM and needed a VRAM upgrade. Saw that the P40 was similar in performance to the 1070TI but with 24GB of VRAM and lucked up finding one on EBay. I have it slotted in my PCIex16 slot and my 1070TI slotted in the PCIex8 slot tied to my main display. I also have a AMD Ryzen 5700G with integrated Vega graphics but the motherboard won't support using integrated alongside another display capable card in the PCIe slot so I had to disable it. As far as drivers I have the datacenter drivers and Geforece drivers installed. The system treats the P40 as GPU-0 and the 1070TI as GPU-1 per the nvidia-smi command. The other good thing is that unlike my 1070TI cards where Windows 11 reserved a significant chunk of VRAM, the data center card gets to use the entire 24GB. All of my previously used programs such as Simswap, Sbr-Swap, Automatic1111, InvokeAI, WhisperAI, tortoiseTTL, have been used with no issues. It's like I'm still using the second 1070TI card but with more memory.

There were a few caveats to the P40 install.

The first is that it only comes with the passive heatsink and the bracket supports a cooling fan on it. Due to the length of the card even in my large Corsair chassis, the cooling options were quickly narrowed down to a 3D printed friction fit snoot that had a small sized high RPM server fan attached to it. Here is the link for it: https://www.ebay.com/itm/313527520178

The second caveat is that it takes a 8-pin CPU power connector to power its massive 240W demand (I already had a 1200W power supply so that wasn't an issue for me concerning the Power Supply capacity but is certainly something to keep in mind if considering it). It looks almost identical to the PCIe 8-pin connector but is keyed differently. What Dell and others sell to be able to hook into that 8-pin connector if you don't have a second CPU cable harness from your Power Supply is a Y harness that takes two separate PCIe connectors that merge into the needed 9-pin CPU connector. I bought a compatible one on Amazon here: https://www.amazon.com/dp/B07M9X68DS Don't forget that each input to that harness must come from it's own dedicated PCIe output from the Power Supply. In other words most PCIe cable branches that come out of the power supply, you have two male PCIe connectors. Since both of those share the total overall power output of the single cable coming out of the power supply, you cannot use it for this setup as the power demands of the P40 requires one input to the y harness be connected to one of these power supply PCIe outputs and the other y harness input be connected to a completely different PCIe power supply output to double the current carrying capacity to the P40. My power supply had four of these main PCIe connector branches so I was able to use one to power my 1070TI and two of the remaining three to power the P40 via the harness.

The last caveat I had to do some tinkering in my BIOS to get the system to boot past POST was to set the following settings if available in your BIOS. My mother board is a GIGABYTE B550M AORUS PRO AM4 for reference.

Initial Display Output: PCIe 2 Slot (this is my 1070TI slot) Integrated Graphics: Disabled PCIEX16 Bifurcation: Auto Above 4G Decoding: Enabled (This was the crucial setting to allow addressing all 24GB of the VRAM)

Stable Tuner is the first program I have run into issues with but I am not necessarily convinced it is related to the card other than the note on the main Github page about being for 3080 and 3090 cards. I have tried the install from scratch three times now and setup the concept permutations several ways but I always get the error posted above. After some digging I came across a Reddit post discussing another program with this error that was tied to limited video card support on the program to the RTX series of cards only.

Echolink50 commented 1 year ago

Thanks so much for the lengthy post. I currently have a 3060ti with Ryzen 2600 in this desktop. I plan to put my 1050ti back in with the P40 then move my 3060ti to my other desktop. I don't want to clog up your issue report but thanks for the help.

devilismyfriend / StableTuner

Getting Error trying to start training (potential lack of support for Datacenter Tesla P40 24GB card) #62