When using the Replicate API with my own images, I receive a memory issue. This memory issue arises because the app is not resizing the image to the required size for the model. If the images are too big, it fails. Here are the logs:
Running predict()...
Using seed: 17664
0%| | 0/50 [00:00<?, ?it/s]
0%| | 0/50 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/src/src/cog/python/cog/server/runner.py", line 288, in _run_prediction
output = self.predictor.predict(**prediction_input)
File "/root/.pyenv/versions/3.10.8/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/root/.pyenv/versions/3.10.8/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 12, in decorate_autocast
return func(*args, **kwargs)
File "/src/predict.py", line 81, in predict
output = self.pipe(
File "/root/.pyenv/versions/3.10.8/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/root/.pyenv/versions/3.10.8/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py", line 392, in __call__
noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=text_embeddings).sample
File "/root/.pyenv/versions/3.10.8/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/root/.pyenv/versions/3.10.8/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 296, in forward
sample, res_samples = downsample_block(
File "/root/.pyenv/versions/3.10.8/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/root/.pyenv/versions/3.10.8/lib/python3.10/site-packages/diffusers/models/unet_blocks.py", line 563, in forward
hidden_states = attn(hidden_states, context=encoder_hidden_states)
File "/root/.pyenv/versions/3.10.8/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/root/.pyenv/versions/3.10.8/lib/python3.10/site-packages/diffusers/models/attention.py", line 162, in forward
hidden_states = block(hidden_states, context=context)
File "/root/.pyenv/versions/3.10.8/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/root/.pyenv/versions/3.10.8/lib/python3.10/site-packages/diffusers/models/attention.py", line 211, in forward
hidden_states = self.attn1(self.norm1(hidden_states)) + hidden_states
File "/root/.pyenv/versions/3.10.8/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/root/.pyenv/versions/3.10.8/lib/python3.10/site-packages/diffusers/models/attention.py", line 283, in forward
hidden_states = self._attention(query, key, value)
File "/root/.pyenv/versions/3.10.8/lib/python3.10/site-packages/diffusers/models/attention.py", line 291, in _attention
attention_scores = torch.matmul(query, key.transpose(-1, -2)) * self.scale
RuntimeError: CUDA out of memory. Tried to allocate 71.54 GiB (GPU 0; 39.59 GiB total capacity; 3.09 GiB already allocated; 3.71 GiB free; 34.17 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
When using the Replicate API with my own images, I receive a memory issue. This memory issue arises because the app is not resizing the image to the required size for the model. If the images are too big, it fails. Here are the logs: