bghira / SimpleTuner

A general fine-tuning kit geared toward diffusion models.
GNU Affero General Public License v3.0
1.79k stars 171 forks source link

torch._dynamo.exc.BackendCompilerFailed #1101

Closed roblaughter closed 2 weeks ago

roblaughter commented 2 weeks ago

I have absolutely no idea where to start here...

Long story short, I'm trying to train an SD 3.5 LoRA on a Mac M1 with 64GB memory. As soon as training starts, I get the error below.

torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
LoweringException: TypeError: 'NoneType' object is not callable

Have you encountered this, and can you help point me in the right direction for getting past this?

Set custom env vars permanently in config/config.env:
TRAINING_NUM_PROCESSES not set, defaulting to 1.
TRAINING_NUM_MACHINES not set, defaulting to 1.
TRAINING_DYNAMO_BACKEND not set, defaulting to no.
ENV not set, defaulting to default.
Using json backend: config/config.json
Updating dependencies. Set DISABLE_UPDATES to prevent this.
Installing dependencies from lock file

No dependencies to install or update
Accelerate config file not found: /Users/rlaughter/.cache/huggingface/accelerate/default_config.yaml. Using values from config.env.
INFO:root:lm_eval is not installed, GPTQ may not be usable
2024-10-25 15:10:22,717 [INFO] Using json configuration backend.
2024-10-25 15:10:22,717 [INFO] [CONFIG.JSON] Loaded configuration from config/config.json
2024-10-25 15:10:22,717 [WARNING] Skipping false argument: --disable_benchmark
2024-10-25 15:10:22,717 [WARNING] Skipping false argument: --validation_torch_compile
--resume_from_checkpoint=latest
--data_backend_config=config/multidatabackend.json
--aspect_bucket_rounding=2
--seed=42
--minimum_image_size=0
--output_dir=output/models
--lora_type=standard
--lora_rank=64
--max_train_steps=1000
--num_train_epochs=0
--checkpointing_steps=250
--checkpoints_total_limit=5
--tracker_project_name=yearbook-lora
--tracker_run_name=test-1
--report_to=none
--model_type=lora
--pretrained_model_name_or_path=stabilityai/stable-diffusion-3.5-large
--model_family=sd3
--train_batch_size=1
--gradient_checkpointing
--caption_dropout_probability=0.0
--resolution_type=pixel_area
--resolution=1024
--validation_seed=42
--validation_steps=100
--validation_resolution=1024x1024
--validation_guidance=4
--validation_guidance_rescale=0.0
--validation_num_inference_steps=20
--validation_prompt=an awkward yearbook photo of Princess Leia
--mixed_precision=no
--optimizer=ao-adamw8bit
--learning_rate=1e-5
--lr_scheduler=polynomial
--lr_warmup_steps=100
--base_model_precision=int8-quanto
--text_encoder_1_precision=no_change
--text_encoder_2_precision=no_change
--text_encoder_3_precision=no_change
2024-10-25 15:10:22,720 [ERROR] MPS does not benefit from models being quantized on the accelerator device. Overriding --quantize_via to 'cpu'.
2024-10-25 15:10:22,720 [INFO] VAE Model: madebyollin/sdxl-vae-fp16-fix
2024-10-25 15:10:22,720 [INFO] Default VAE Cache location: 
2024-10-25 15:10:22,720 [INFO] Text Cache location: cache
2024-10-25 15:10:22,720 [WARNING] MM-DiT requires an alignment value of 64px. Overriding the value of --aspect_bucket_alignment.
2024-10-25 15:10:22,720 [INFO] SD3 embeds for unconditional captions: t5=empty_string, clip=empty_string
2024-10-25 15:10:22,720 [WARNING] SD3 requires the use of the 'sd35' flow matching loss. Overriding the value of --flow_matching_loss.
2024-10-25 15:10:22,720 [WARNING] Updating T5 XXL tokeniser max length to 256 for SD3.
2024-10-25 15:10:22,720 [WARNING] Stable Diffusion 3 requires --max_grad_norm=0.01 to prevent model collapse. Overriding value. Set this value manually to disable this warning.
2024-10-25 15:10:22,722 [INFO] Load CLIP text encoder..
2024-10-25 15:10:46,337 [INFO] Load VAE: stabilityai/stable-diffusion-3.5-large
2024-10-25 15:10:46,529 [INFO] Loading VAE onto accelerator, converting from torch.float32 to torch.bfloat16
2024-10-25 15:10:46,628 [INFO] Load tokenizers
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
2024-10-25 15:10:47,052 [INFO] Loading CLIP text encoder from stabilityai/stable-diffusion-3.5-large/text_encoder..
2024-10-25 15:10:47,310 [INFO] Loading LAION OpenCLIP-G/14 text encoder..
2024-10-25 15:10:47,834 [INFO] Loading T5-XXL v1.1 text encoder..
Downloading shards: 100%|██████████████████████| 2/2 [00:00<00:00, 13706.88it/s]
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:01<00:00,  1.23it/s]
2024-10-25 15:10:51,068 [INFO] Moving text encoder to GPU.
2024-10-25 15:10:51,178 [INFO] Moving text encoder 2 to GPU.
2024-10-25 15:10:51,407 [INFO] Moving text encoder 3 to GPU.
2024-10-25 15:10:52,156 [INFO] Loading data backend config from config/multidatabackend.json
2024-10-25 15:10:52,248 [INFO] Configuring text embed backend: text-embed-cache
2024-10-25 15:10:52,249 [INFO] (Rank: 0) (id=text-embed-cache) Listing all text embed cache entries
2024-10-25 15:10:52,252 [INFO] Pre-computing null embedding
2024-10-25 15:10:57,258 [WARNING] Not using caption dropout will potentially lead to overfitting on captions, eg. CFG will not work very well. Set --caption-dropout_probability=0.1 as a recommended value.
2024-10-25 15:10:57,259 [INFO] Completed loading text embed services.
2024-10-25 15:10:57,259 [INFO] Configuring data backend: yearbook-512
2024-10-25 15:10:57,262 [INFO] (id=yearbook-512) Loading bucket manager.
2024-10-25 15:10:57,265 [INFO] (id=yearbook-512) Refreshing aspect buckets on main process.
2024-10-25 15:10:57,265 [INFO] Discovering new files...
2024-10-25 15:10:57,276 [INFO] Compressed 81 existing files from 5.
2024-10-25 15:10:57,276 [INFO] No new files discovered. Doing nothing.
2024-10-25 15:10:57,276 [INFO] Statistics: {'total_processed': 0, 'skipped': {'already_exists': 81, 'metadata_missing': 0, 'not_found': 0, 'too_small': 0, 'other': 0}}
2024-10-25 15:10:57,277 [WARNING] Key crop_aspect not found in the current backend config, using the existing value 'square'.
2024-10-25 15:10:57,277 [WARNING] Key disable_validation not found in the current backend config, using the existing value 'False'.
2024-10-25 15:10:57,277 [WARNING] Key config_version not found in the current backend config, using the existing value '2'.
2024-10-25 15:10:57,277 [WARNING] Key hash_filenames not found in the current backend config, using the existing value 'True'.
2024-10-25 15:10:57,277 [INFO] Configured backend: {'id': 'yearbook-512', 'config': {'repeats': 5, 'crop': False, 'crop_aspect': 'square', 'crop_style': 'random', 'disable_validation': False, 'resolution': 0.262144, 'resolution_type': 'area', 'caption_strategy': 'textfile', 'instance_data_dir': '/Users/rlaughter/Downloads/LoRA Datasets/AA Yearbook', 'maximum_image_size': None, 'target_downsample_size': None, 'config_version': 2, 'hash_filenames': True}, 'dataset_type': 'image', 'data_backend': <helpers.data_backend.local.LocalDataBackend object at 0x12d6c7250>, 'instance_data_dir': '/Users/rlaughter/Downloads/LoRA Datasets/AA Yearbook', 'metadata_backend': <helpers.metadata.backends.discovery.DiscoveryMetadataBackend object at 0x146fb5210>}
(Rank: 0)  | Bucket     | Image Count (per-GPU)
------------------------------
(Rank: 0)  | 0.78       | 27          
(Rank: 0)  | 0.7        | 50          
(Rank: 0)  | 1.0        | 2           
(Rank: 0)  | 1.83       | 1           
(Rank: 0)  | 1.29       | 1           
2024-10-25 15:10:57,278 [INFO] (id=yearbook-512) Collecting captions.
Loading captions:   0%|                                                                                                                                         2024-10-25 15:10:57,289 [INFO] (id=yearbook-512) Initialise text embed pre-computation using the textfile caption strategy. We have 81 captions to process.
2024-10-25 15:10:57,292 [INFO] (id=yearbook-512) Completed processing 81 captions.
2024-10-25 15:10:57,292 [INFO] (id=yearbook-512) Creating VAE latent cache.
2024-10-25 15:10:57,293 [INFO] (id=yearbook-512) Discovering cache objects..
2024-10-25 15:10:57,296 [INFO] Configured backend: {'id': 'yearbook-512', 'config': {'repeats': 5, 'crop': False, 'crop_aspect': 'square', 'crop_style': 'random', 'disable_validation': False, 'resolution': 0.262144, 'resolution_type': 'area', 'caption_strategy': 'textfile', 'instance_data_dir': '/Users/rlaughter/Downloads/LoRA Datasets/AA Yearbook', 'maximum_image_size': None, 'target_downsample_size': None, 'config_version': 2, 'hash_filenames': True}, 'dataset_type': 'image', 'data_backend': <helpers.data_backend.local.LocalDataBackend object at 0x12d6c7250>, 'instance_data_dir': '/Users/rlaughter/Downloads/LoRA Datasets/AA Yearbook', 'metadata_backend': <helpers.metadata.backends.discovery.DiscoveryMetadataBackend object at 0x146fb5210>, 'train_dataset': <helpers.multiaspect.dataset.MultiAspectDataset object at 0x12e012d10>, 'sampler': <helpers.multiaspect.sampler.MultiAspectSampler object at 0x146fb6610>, 'train_dataloader': <torch.utils.data.dataloader.DataLoader object at 0x146fc0050>, 'text_embed_cache': <helpers.caching.text_embeds.TextEmbeddingCache object at 0x146fb4b10>, 'vaecache': <helpers.caching.vae.VAECache object at 0x12d420e90>}
2024-10-25 15:10:57,296 [INFO] Configuring data backend: yearbook-1024
2024-10-25 15:10:57,297 [INFO] (id=yearbook-1024) Loading bucket manager.
2024-10-25 15:10:57,297 [INFO] (id=yearbook-1024) Refreshing aspect buckets on main process.
2024-10-25 15:10:57,297 [INFO] Discovering new files...
2024-10-25 15:10:57,302 [INFO] Compressed 81 existing files from 9.
2024-10-25 15:10:57,302 [INFO] No new files discovered. Doing nothing.
2024-10-25 15:10:57,302 [INFO] Statistics: {'total_processed': 0, 'skipped': {'already_exists': 81, 'metadata_missing': 0, 'not_found': 0, 'too_small': 0, 'other': 0}}
2024-10-25 15:10:57,302 [WARNING] Key crop_aspect not found in the current backend config, using the existing value 'square'.
2024-10-25 15:10:57,302 [WARNING] Key disable_validation not found in the current backend config, using the existing value 'False'.
2024-10-25 15:10:57,302 [WARNING] Key config_version not found in the current backend config, using the existing value '2'.
2024-10-25 15:10:57,302 [WARNING] Key hash_filenames not found in the current backend config, using the existing value 'True'.
2024-10-25 15:10:57,303 [INFO] Configured backend: {'id': 'yearbook-1024', 'config': {'repeats': 5, 'crop': False, 'crop_aspect': 'square', 'crop_style': 'random', 'disable_validation': False, 'resolution': 1.048576, 'resolution_type': 'area', 'caption_strategy': 'textfile', 'instance_data_dir': '/Users/rlaughter/Downloads/LoRA Datasets/AA Yearbook', 'maximum_image_size': None, 'target_downsample_size': None, 'config_version': 2, 'hash_filenames': True}, 'dataset_type': 'image', 'data_backend': <helpers.data_backend.local.LocalDataBackend object at 0x146fb5290>, 'instance_data_dir': '/Users/rlaughter/Downloads/LoRA Datasets/AA Yearbook', 'metadata_backend': <helpers.metadata.backends.discovery.DiscoveryMetadataBackend object at 0x146fc21d0>}
(Rank: 0)  | Bucket     | Image Count (per-GPU)
------------------------------
(Rank: 0)  | 0.83       | 1           
(Rank: 0)  | 0.65       | 46          
(Rank: 0)  | 0.88       | 7           
(Rank: 0)  | 0.68       | 6           
(Rank: 0)  | 1.0        | 2           
(Rank: 0)  | 0.74       | 6           
(Rank: 0)  | 0.78       | 11          
(Rank: 0)  | 1.83       | 1           
(Rank: 0)  | 1.29       | 1           
2024-10-25 15:10:57,303 [INFO] (id=yearbook-1024) Collecting captions.
Loading captions:   0%|                                                                                                                                         2024-10-25 15:10:57,306 [INFO] (id=yearbook-1024) Initialise text embed pre-computation using the textfile caption strategy. We have 81 captions to process.
2024-10-25 15:10:57,308 [INFO] (id=yearbook-1024) Completed processing 81 captions.
2024-10-25 15:10:57,308 [INFO] (id=yearbook-1024) Creating VAE latent cache.
2024-10-25 15:10:57,308 [INFO] (id=yearbook-1024) Discovering cache objects..
2024-10-25 15:10:57,311 [INFO] Configured backend: {'id': 'yearbook-1024', 'config': {'repeats': 5, 'crop': False, 'crop_aspect': 'square', 'crop_style': 'random', 'disable_validation': False, 'resolution': 1.048576, 'resolution_type': 'area', 'caption_strategy': 'textfile', 'instance_data_dir': '/Users/rlaughter/Downloads/LoRA Datasets/AA Yearbook', 'maximum_image_size': None, 'target_downsample_size': None, 'config_version': 2, 'hash_filenames': True}, 'dataset_type': 'image', 'data_backend': <helpers.data_backend.local.LocalDataBackend object at 0x146fb5290>, 'instance_data_dir': '/Users/rlaughter/Downloads/LoRA Datasets/AA Yearbook', 'metadata_backend': <helpers.metadata.backends.discovery.DiscoveryMetadataBackend object at 0x146fc21d0>, 'train_dataset': <helpers.multiaspect.dataset.MultiAspectDataset object at 0x12e377590>, 'sampler': <helpers.multiaspect.sampler.MultiAspectSampler object at 0x147265b50>, 'train_dataloader': <torch.utils.data.dataloader.DataLoader object at 0x147265c50>, 'text_embed_cache': <helpers.caching.text_embeds.TextEmbeddingCache object at 0x146fb4b10>, 'vaecache': <helpers.caching.vae.VAECache object at 0x12a6327d0>}
2024-10-25 15:10:57,311 [INFO] Configuring data backend: yearbook-512-crop
2024-10-25 15:10:57,311 [INFO] (id=yearbook-512-crop) Loading bucket manager.
2024-10-25 15:10:57,312 [INFO] (id=yearbook-512-crop) Refreshing aspect buckets on main process.
2024-10-25 15:10:57,312 [INFO] Discovering new files...
2024-10-25 15:10:57,316 [INFO] Compressed 81 existing files from 1.
2024-10-25 15:10:57,316 [INFO] No new files discovered. Doing nothing.
2024-10-25 15:10:57,316 [INFO] Statistics: {'total_processed': 0, 'skipped': {'already_exists': 81, 'metadata_missing': 0, 'not_found': 0, 'too_small': 0, 'other': 0}}
2024-10-25 15:10:57,316 [WARNING] Key crop_aspect not found in the current backend config, using the existing value 'square'.
2024-10-25 15:10:57,316 [WARNING] Key disable_validation not found in the current backend config, using the existing value 'False'.
2024-10-25 15:10:57,316 [WARNING] Key config_version not found in the current backend config, using the existing value '2'.
2024-10-25 15:10:57,316 [WARNING] Key hash_filenames not found in the current backend config, using the existing value 'True'.
2024-10-25 15:10:57,316 [INFO] Configured backend: {'id': 'yearbook-512-crop', 'config': {'repeats': 5, 'crop': True, 'crop_aspect': 'square', 'crop_style': 'random', 'disable_validation': False, 'resolution': 0.262144, 'resolution_type': 'area', 'caption_strategy': 'textfile', 'instance_data_dir': '/Users/rlaughter/Downloads/LoRA Datasets/AA Yearbook', 'maximum_image_size': None, 'target_downsample_size': None, 'config_version': 2, 'hash_filenames': True}, 'dataset_type': 'image', 'data_backend': <helpers.data_backend.local.LocalDataBackend object at 0x12e374b90>, 'instance_data_dir': '/Users/rlaughter/Downloads/LoRA Datasets/AA Yearbook', 'metadata_backend': <helpers.metadata.backends.discovery.DiscoveryMetadataBackend object at 0x147265410>}
(Rank: 0)  | Bucket     | Image Count (per-GPU)
------------------------------
(Rank: 0)  | 1.0        | 81          
2024-10-25 15:10:57,316 [INFO] (id=yearbook-512-crop) Collecting captions.
Loading captions:   0%|                                                                                                                                         2024-10-25 15:10:57,319 [INFO] (id=yearbook-512-crop) Initialise text embed pre-computation using the textfile caption strategy. We have 81 captions to process.
2024-10-25 15:10:57,320 [INFO] (id=yearbook-512-crop) Completed processing 81 captions.
2024-10-25 15:10:57,320 [INFO] (id=yearbook-512-crop) Creating VAE latent cache.
2024-10-25 15:10:57,321 [INFO] (id=yearbook-512-crop) Discovering cache objects..
2024-10-25 15:10:57,323 [INFO] Configured backend: {'id': 'yearbook-512-crop', 'config': {'repeats': 5, 'crop': True, 'crop_aspect': 'square', 'crop_style': 'random', 'disable_validation': False, 'resolution': 0.262144, 'resolution_type': 'area', 'caption_strategy': 'textfile', 'instance_data_dir': '/Users/rlaughter/Downloads/LoRA Datasets/AA Yearbook', 'maximum_image_size': None, 'target_downsample_size': None, 'config_version': 2, 'hash_filenames': True}, 'dataset_type': 'image', 'data_backend': <helpers.data_backend.local.LocalDataBackend object at 0x12e374b90>, 'instance_data_dir': '/Users/rlaughter/Downloads/LoRA Datasets/AA Yearbook', 'metadata_backend': <helpers.metadata.backends.discovery.DiscoveryMetadataBackend object at 0x147265410>, 'train_dataset': <helpers.multiaspect.dataset.MultiAspectDataset object at 0x1470bd390>, 'sampler': <helpers.multiaspect.sampler.MultiAspectSampler object at 0x1470bd790>, 'train_dataloader': <torch.utils.data.dataloader.DataLoader object at 0x1470bd690>, 'text_embed_cache': <helpers.caching.text_embeds.TextEmbeddingCache object at 0x146fb4b10>, 'vaecache': <helpers.caching.vae.VAECache object at 0x147265490>}
2024-10-25 15:10:57,323 [INFO] Configuring data backend: yearbook-1024-crop
2024-10-25 15:10:57,323 [INFO] (id=yearbook-1024-crop) Loading bucket manager.
2024-10-25 15:10:57,324 [INFO] (id=yearbook-1024-crop) Refreshing aspect buckets on main process.
2024-10-25 15:10:57,324 [INFO] Discovering new files...
2024-10-25 15:10:57,327 [INFO] Compressed 81 existing files from 1.
2024-10-25 15:10:57,327 [INFO] No new files discovered. Doing nothing.
2024-10-25 15:10:57,327 [INFO] Statistics: {'total_processed': 0, 'skipped': {'already_exists': 81, 'metadata_missing': 0, 'not_found': 0, 'too_small': 0, 'other': 0}}
2024-10-25 15:10:57,327 [WARNING] Key crop_aspect not found in the current backend config, using the existing value 'square'.
2024-10-25 15:10:57,327 [WARNING] Key disable_validation not found in the current backend config, using the existing value 'False'.
2024-10-25 15:10:57,327 [WARNING] Key config_version not found in the current backend config, using the existing value '2'.
2024-10-25 15:10:57,328 [WARNING] Key hash_filenames not found in the current backend config, using the existing value 'True'.
2024-10-25 15:10:57,328 [INFO] Configured backend: {'id': 'yearbook-1024-crop', 'config': {'repeats': 5, 'crop': True, 'crop_aspect': 'square', 'crop_style': 'random', 'disable_validation': False, 'resolution': 1.048576, 'resolution_type': 'area', 'caption_strategy': 'textfile', 'instance_data_dir': '/Users/rlaughter/Downloads/LoRA Datasets/AA Yearbook', 'maximum_image_size': None, 'target_downsample_size': None, 'config_version': 2, 'hash_filenames': True}, 'dataset_type': 'image', 'data_backend': <helpers.data_backend.local.LocalDataBackend object at 0x12d6326d0>, 'instance_data_dir': '/Users/rlaughter/Downloads/LoRA Datasets/AA Yearbook', 'metadata_backend': <helpers.metadata.backends.discovery.DiscoveryMetadataBackend object at 0x146fc3610>}
(Rank: 0)  | Bucket     | Image Count (per-GPU)
------------------------------
(Rank: 0)  | 1.0        | 81          
2024-10-25 15:10:57,328 [INFO] (id=yearbook-1024-crop) Collecting captions.
Loading captions:   0%|                                                                                                                                         2024-10-25 15:10:57,330 [INFO] (id=yearbook-1024-crop) Initialise text embed pre-computation using the textfile caption strategy. We have 81 captions to process.
2024-10-25 15:10:57,331 [INFO] (id=yearbook-1024-crop) Completed processing 81 captions.
2024-10-25 15:10:57,331 [INFO] (id=yearbook-1024-crop) Creating VAE latent cache.
2024-10-25 15:10:57,332 [INFO] (id=yearbook-1024-crop) Discovering cache objects..
2024-10-25 15:10:57,333 [INFO] Configured backend: {'id': 'yearbook-1024-crop', 'config': {'repeats': 5, 'crop': True, 'crop_aspect': 'square', 'crop_style': 'random', 'disable_validation': False, 'resolution': 1.048576, 'resolution_type': 'area', 'caption_strategy': 'textfile', 'instance_data_dir': '/Users/rlaughter/Downloads/LoRA Datasets/AA Yearbook', 'maximum_image_size': None, 'target_downsample_size': None, 'config_version': 2, 'hash_filenames': True}, 'dataset_type': 'image', 'data_backend': <helpers.data_backend.local.LocalDataBackend object at 0x12d6326d0>, 'instance_data_dir': '/Users/rlaughter/Downloads/LoRA Datasets/AA Yearbook', 'metadata_backend': <helpers.metadata.backends.discovery.DiscoveryMetadataBackend object at 0x146fc3610>, 'train_dataset': <helpers.multiaspect.dataset.MultiAspectDataset object at 0x1470b2bd0>, 'sampler': <helpers.multiaspect.sampler.MultiAspectSampler object at 0x1470b0ad0>, 'train_dataloader': <torch.utils.data.dataloader.DataLoader object at 0x1470b0190>, 'text_embed_cache': <helpers.caching.text_embeds.TextEmbeddingCache object at 0x146fb4b10>, 'vaecache': <helpers.caching.vae.VAECache object at 0x12a656310>}
2024-10-25 15:10:58,491 [INFO] Precomputing the negative prompt embed for validations.
2024-10-25 15:10:59,214 [INFO] Calculated our maximum training steps at 1000 because we have 1 epochs and 1944 steps per epoch.
2024-10-25 15:10:59,214 [INFO] Collected the following data backends: ['text-embed-cache', 'yearbook-512', 'yearbook-1024', 'yearbook-512-crop', 'yearbook-1024-crop']
2024-10-25 15:10:59,953 [INFO] Precomputing the negative prompt embed for validations.
2024-10-25 15:11:00,691 [INFO] Unloading text encoders, as they are not being trained.
2024-10-25 15:11:01,869 [INFO] After nuking text encoders from orbit, we freed 10.39 GB of VRAM. The real memories were the friends we trained a model on along the way.
2024-10-25 15:11:02,028 [INFO] After nuking the VAE from orbit, we freed 163.84 MB of VRAM.
2024-10-25 15:11:02,028 [INFO] Loading Stable Diffusion 3 diffusion transformer..
Fetching 2 files: 100%|█████████████████████████| 2/2 [00:00<00:00, 8516.35it/s]
2024-10-25 15:11:03,162 [INFO] Moving transformer to dtype=torch.bfloat16, device=cpu
2024-10-25 15:11:03,165 [INFO] Loading Quanto. This may take a few minutes.
2024-10-25 15:11:03,167 [INFO] Quantising SD3Transformer2DModel. Using int8-quanto.
2024-10-25 15:11:03,167 [INFO] Freezing model weights only
2024-10-25 15:11:17,558 [INFO] Using LoRA training mode (rank=64)
2024-10-25 15:11:17,825 [INFO] Moving the diffusion transformer to GPU in int8-quanto precision.
2024-10-25 15:11:19,651 [INFO] Learning rate: 1e-05
2024-10-25 15:11:19,652 [INFO] cls: <class 'torchao.prototype.low_bit_optim.adam.AdamW8bit'>, settings: {'betas': (0.9, 0.999), 'weight_decay': 0.01, 'eps': 1e-06}
2024-10-25 15:11:19,655 [INFO] Optimizer arguments={'lr': 1e-05, 'betas': (0.9, 0.999), 'weight_decay': 0.01, 'eps': 1e-06}
2024-10-25 15:11:19,656 [INFO] Loading polynomial learning rate scheduler with 100 warmup steps
2024-10-25 15:11:19,656 [INFO] Using Polynomial learning rate scheduler with last epoch -2.
2024-10-25 15:11:19,660 [INFO] Preparing models..
2024-10-25 15:11:19,660 [INFO] Loading our accelerator...
2024-10-25 15:11:19,706 [INFO] Checkpoint 'latest' does not exist. Starting a new training run.
[2024-10-25 15:11:19,749] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to mps (auto detect)
W1025 15:11:19.930000 49994 .venv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs.
2024-10-25 15:11:19,959 [INFO] 
***** Running training *****
-  Num batches = 1944
-  Num Epochs = 1
  - Current Epoch = 1
-  Total train batch size (w. parallel, distributed & accumulation) = 1
  - Instantaneous batch size per device = 1
  - Gradient Accumulation steps = 1
-  Total optimization steps = 1000
-  Total optimization steps remaining = 1000
Epoch 1/1 Steps:   0%|                                                          Epoch 1/1, Steps:   0%|                                                                             | 0/1000 [00:00<?, ?it/s]backend='inductor' raised:
LoweringException: TypeError: 'NoneType' object is not callable
  target: aten.amax.default
  args[0]: TensorBox(StorageBox(
    Pointwise(
      'mps',
      torch.float32,
      def inner_fn(index):
          i0, i1 = index
          tmp0 = ops.constant(0.10000000149011612, torch.float32)
          tmp1 = ops.load(arg4_1, 2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432))
          tmp2 = ops.to_dtype(tmp1, torch.float32, src_dtype=torch.bfloat16)
          tmp3 = ops.load(arg1_1, 2432 * ModularIndexing(2048 * ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 2048, 76) + ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 1, 2048), 2432, 64) + ModularIndexing(2048 * ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 2048, 76) + ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 1, 2048), 1, 2432))
          tmp4 = ops.to_dtype(tmp3, torch.int32, src_dtype=torch.uint8)
          tmp5 = ops.load(arg3_1, tmp4)
          tmp6 = ops.load(arg2_1, ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 2048, 76))
          tmp7 = tmp5 * tmp6
          tmp8 = tmp2 - tmp7
          tmp9 = tmp0 * tmp8
          tmp10 = ops.constant(False, torch.bool)
          tmp11 = ops.load(arg4_1, 2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432))
          tmp12 = ops.to_dtype(tmp11, torch.float32, src_dtype=torch.bfloat16)
          tmp13 = ops.load(arg1_1, 2432 * ModularIndexing(2048 * ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 2048, 76) + ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 1, 2048), 2432, 64) + ModularIndexing(2048 * ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 2048, 76) + ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 1, 2048), 1, 2432))
          tmp14 = ops.to_dtype(tmp13, torch.int32, src_dtype=torch.uint8)
          tmp15 = ops.load(arg3_1, tmp14)
          tmp16 = ops.load(arg2_1, ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 2048, 76))
          tmp17 = tmp15 * tmp16
          tmp18 = ops.where(tmp10, tmp12, tmp17)
          tmp19 = tmp9 + tmp18
          tmp20 = ops.abs(tmp19)
          return tmp20
      ,
      ranges=[76, 2048],
      origin_node=abs_3,
      origins=OrderedSet([abs_3])
    )
  ))
  args[1]: [-1]

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

Traceback (most recent call last):
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1446, in _call_user_compiler
    compiled_fn = compiler_fn(gm, self.example_inputs())
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 129, in __call__
    compiled_gm = compiler_fn(gm, example_inputs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 129, in __call__
    compiled_gm = compiler_fn(gm, example_inputs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/__init__.py", line 2235, in __call__
    return compile_fx(model_, inputs_, config_patches=self.config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1521, in compile_fx
    return aot_autograd(
           ^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_dynamo/backends/common.py", line 72, in __call__
    cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1071, in aot_module_simplified
    compiled_fn = dispatch_and_compile()
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1056, in dispatch_and_compile
    compiled_fn, _ = create_aot_dispatcher_function(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 522, in create_aot_dispatcher_function
    return _create_aot_dispatcher_function(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 759, in _create_aot_dispatcher_function
    compiled_fn, fw_metadata = compiler_fn(
                               ^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 179, in aot_dispatch_base
    compiled_fw = compiler(fw_module, updated_flat_args)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1350, in fw_compiler_base
    return _fw_compiler_base(model, example_inputs, is_inference)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1421, in _fw_compiler_base
    return inner_compile(
           ^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 475, in compile_fx_inner
    return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_dynamo/repro/after_aot.py", line 85, in debug_wrapper
    inner_compiled_fn = compiler_fn(gm, example_inputs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 661, in _compile_fx_inner
    compiled_graph = FxGraphCache.load(
                     ^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_inductor/codecache.py", line 1334, in load
    compiled_graph = compile_fx_fn(
                     ^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 570, in codegen_and_compile
    compiled_graph = fx_codegen_and_compile(gm, example_inputs, **fx_kwargs)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 859, in fx_codegen_and_compile
    graph.run(*example_inputs)
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_inductor/graph.py", line 780, in run
    return super().run(*args)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/fx/interpreter.py", line 146, in run
    self.env[node] = self.run_node(node)
                     ^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_inductor/graph.py", line 1319, in run_node
    result = super().run_node(n)
             ^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/fx/interpreter.py", line 203, in run_node
    return getattr(self, n.op)(n.target, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_inductor/graph.py", line 1024, in call_function
    raise LoweringException(e, target, args, kwargs).with_traceback(
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_inductor/graph.py", line 1021, in call_function
    out = lowerings[target](*args, **kwargs)  # type: ignore[index]
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_inductor/lowering.py", line 361, in wrapped
    out = decomp_fn(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_inductor/lowering.py", line 5108, in inner
    result = Reduction.create(reduction_type=reduction_type, input_node=x, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_inductor/ir.py", line 1179, in create
    hint, split = cls.num_splits(
                  ^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_inductor/ir.py", line 851, in num_splits
    not V.graph.has_feature(device, BackendFeature.REDUCE_TO_SINGLE_ELEMENT)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_inductor/graph.py", line 465, in has_feature
    return feature in self.get_backend_features(get_device_type(device))
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_inductor/codegen/common.py", line 170, in get_backend_features
    return scheduling(None).get_backend_features(device)
           ^^^^^^^^^^^^^^^^
torch._inductor.exc.LoweringException: TypeError: 'NoneType' object is not callable
  target: aten.amax.default
  args[0]: TensorBox(StorageBox(
    Pointwise(
      'mps',
      torch.float32,
      def inner_fn(index):
          i0, i1 = index
          tmp0 = ops.constant(0.10000000149011612, torch.float32)
          tmp1 = ops.load(arg4_1, 2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432))
          tmp2 = ops.to_dtype(tmp1, torch.float32, src_dtype=torch.bfloat16)
          tmp3 = ops.load(arg1_1, 2432 * ModularIndexing(2048 * ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 2048, 76) + ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 1, 2048), 2432, 64) + ModularIndexing(2048 * ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 2048, 76) + ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 1, 2048), 1, 2432))
          tmp4 = ops.to_dtype(tmp3, torch.int32, src_dtype=torch.uint8)
          tmp5 = ops.load(arg3_1, tmp4)
          tmp6 = ops.load(arg2_1, ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 2048, 76))
          tmp7 = tmp5 * tmp6
          tmp8 = tmp2 - tmp7
          tmp9 = tmp0 * tmp8
          tmp10 = ops.constant(False, torch.bool)
          tmp11 = ops.load(arg4_1, 2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432))
          tmp12 = ops.to_dtype(tmp11, torch.float32, src_dtype=torch.bfloat16)
          tmp13 = ops.load(arg1_1, 2432 * ModularIndexing(2048 * ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 2048, 76) + ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 1, 2048), 2432, 64) + ModularIndexing(2048 * ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 2048, 76) + ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 1, 2048), 1, 2432))
          tmp14 = ops.to_dtype(tmp13, torch.int32, src_dtype=torch.uint8)
          tmp15 = ops.load(arg3_1, tmp14)
          tmp16 = ops.load(arg2_1, ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 2048, 76))
          tmp17 = tmp15 * tmp16
          tmp18 = ops.where(tmp10, tmp12, tmp17)
          tmp19 = tmp9 + tmp18
          tmp20 = ops.abs(tmp19)
          return tmp20
      ,
      ranges=[76, 2048],
      origin_node=abs_3,
      origins=OrderedSet([abs_3])
    )
  ))
  args[1]: [-1]

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/rlaughter/Documents/SimpleTuner/train.py", line 49, in <module>
    trainer.train()
  File "/Users/rlaughter/Documents/SimpleTuner/helpers/training/trainer.py", line 2546, in train
    self.optimizer.step()
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/accelerate/optimizer.py", line 172, in step
    self.optimizer.step(closure)
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/optim/lr_scheduler.py", line 137, in wrapper
    return func.__get__(opt, opt.__class__)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/optim/optimizer.py", line 487, in wrapper
    out = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torchao/prototype/low_bit_optim/adam.py", line 92, in step
    torch.compile(single_param_adam, fullgraph=True, dynamic=False)(
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 465, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1269, in __call__
    return self._torchdynamo_orig_callable(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 526, in __call__
    return _compile(
           ^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 924, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 666, in compile_inner
    return _compile_inner(code, one_graph, hooks, transform)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_utils_internal.py", line 87, in wrapper_function
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 699, in _compile_inner
    out_code = transform_code_object(code, transform)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_dynamo/bytecode_transformation.py", line 1322, in transform_code_object
    transformations(instructions, code_options)
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 219, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 634, in transform
    tracer.run()
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2796, in run
    super().run()
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 983, in run
    while self.step():
          ^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 895, in step
    self.dispatch_table[inst.opcode](self, inst)
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2987, in RETURN_VALUE
    self._return(inst)
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2972, in _return
    self.output.compile_subgraph(
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1142, in compile_subgraph
    self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1369, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1416, in call_user_compiler
    return self._call_user_compiler(gm)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rlaughter/Documents/SimpleTuner/.venv/lib/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1465, in _call_user_compiler
    raise BackendCompilerFailed(self.compiler_fn, e) from e
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
LoweringException: TypeError: 'NoneType' object is not callable
  target: aten.amax.default
  args[0]: TensorBox(StorageBox(
    Pointwise(
      'mps',
      torch.float32,
      def inner_fn(index):
          i0, i1 = index
          tmp0 = ops.constant(0.10000000149011612, torch.float32)
          tmp1 = ops.load(arg4_1, 2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432))
          tmp2 = ops.to_dtype(tmp1, torch.float32, src_dtype=torch.bfloat16)
          tmp3 = ops.load(arg1_1, 2432 * ModularIndexing(2048 * ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 2048, 76) + ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 1, 2048), 2432, 64) + ModularIndexing(2048 * ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 2048, 76) + ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 1, 2048), 1, 2432))
          tmp4 = ops.to_dtype(tmp3, torch.int32, src_dtype=torch.uint8)
          tmp5 = ops.load(arg3_1, tmp4)
          tmp6 = ops.load(arg2_1, ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 2048, 76))
          tmp7 = tmp5 * tmp6
          tmp8 = tmp2 - tmp7
          tmp9 = tmp0 * tmp8
          tmp10 = ops.constant(False, torch.bool)
          tmp11 = ops.load(arg4_1, 2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432))
          tmp12 = ops.to_dtype(tmp11, torch.float32, src_dtype=torch.bfloat16)
          tmp13 = ops.load(arg1_1, 2432 * ModularIndexing(2048 * ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 2048, 76) + ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 1, 2048), 2432, 64) + ModularIndexing(2048 * ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 2048, 76) + ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 1, 2048), 1, 2432))
          tmp14 = ops.to_dtype(tmp13, torch.int32, src_dtype=torch.uint8)
          tmp15 = ops.load(arg3_1, tmp14)
          tmp16 = ops.load(arg2_1, ModularIndexing(2432 * ModularIndexing(i1 + 2048 * i0, 2432, 64) + ModularIndexing(i1 + 2048 * i0, 1, 2432), 2048, 76))
          tmp17 = tmp15 * tmp16
          tmp18 = ops.where(tmp10, tmp12, tmp17)
          tmp19 = tmp9 + tmp18
          tmp20 = ops.abs(tmp19)
          return tmp20
      ,
      ranges=[76, 2048],
      origin_node=abs_3,
      origins=OrderedSet([abs_3])
    )
  ))
  args[1]: [-1]

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

Epoch 1/1, Steps:   0%|                                                                             | 0/1000 [00:08<?, ?it/s]
bghira commented 2 weeks ago

can't use TorchAO stuff on mac

bghira commented 2 weeks ago

so M1 doesn't support BF16 in hardware. and i'm not sure if the backend supports emulation of bf16 there either.

M2 and newer have this support :[

otherwise it should work.. if you use optimi-lion and a batch size of 1. it will even do int8-quanto base model precision. but we can't make use of FP8 since mac doesn't have it at all.

roblaughter commented 2 weeks ago

otherwise it should work.. if you use optimi-lion and a batch size of 1.

I haven't been up to speed on Torch advancements and new optimizers. That got it working. Thanks!