advimman / lama

🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022
https://advimman.github.io/lama-project/
Apache License 2.0
8.12k stars 861 forks source link

ONNX Model done #315

Open OPHoperHPO opened 6 months ago

OPHoperHPO commented 6 months ago

Hello everyone,

We at @Carve-Photos have successfully ported LaMa (big-lama) to ONNX with results closely resembling the original. Check it out here: https://huggingface.co/Carve/LaMa-ONNX

Upd. Here is HG Space that uses our model https://huggingface.co/spaces/Carve/LaMa-Demo-ONNX

Particle1904 commented 6 months ago

Thank you so much for this! I tried to convert it but failed miserably. Its always a bummer when you find a extremely useful model that can only be used through Python/C/C++... makes it so hard to use it on custom built tools.

senya-ashukha commented 6 months ago

Amazing work! Do you want send a pool request mentioning that in the top of the readme page?

OPHoperHPO commented 6 months ago

Amazing work! Do you want send a pool request mentioning that in the top of the readme page?

We are currently preparing an update to speed up the onnx model. When we release it, we will submit a pull request with guide on how to export the model.

OPHoperHPO commented 6 months ago

@senya-ashukha We have released an updated onnx model and submitted PR #316

K-prog commented 3 months ago

Hey @OPHoperHPO @senya-ashukha , First of all thanks for this conversion, I was trying the ONNX model, and it seems like it differs from the original model and doesn't perform that well, Since pytorch is now compatible with exporting fft_rfftn layers with the new dynamo_export function to ONNX, do you think is there a way to export the model directly?

I tired doing that but it errors out, seems like dynamo_export is still in beta ;-; Also, opened an issue in pytorch.

OPHoperHPO commented 3 months ago

Hello @K-prog,

First of all thanks for this conversion, I was trying the ONNX model, and it seems like it differs from the original model and doesn't perform that well

Could you share your code and the images where the ONNX model performs worse than the original? Many code implementations encounter issues with image preprocessing/postprocessing functions because some preprocessing/postprocessing operations differ from those in the original model (see the code of our HuggingFace Demo Space). I suggest checking this aspect in your code.

Since pytorch is now compatible with exporting fft_rfftn layers with the new dynamo_export function to ONNX, do you think is there a way to export the model directly?

Regarding export via torch.dynamo_export: this doesn’t work without modifying the internal torch and onnxscript code. We have already made an initial attempt to export the model using the new exporter and encountered the same problem as you. We implemented a block of code responsible for converting the aten::fft_rfftn operator into ONNX, tried the nightly Torch build, and resolved many other issues, etc.. This is how the first model (lama.onnx) was created.

If you compare lama_fp32.onnx and lama.onnx models using Netron, you will notice significant differences in architecture. The model exported via torch.dynamo_export is not suitable for use due to a lack of support in other ONNX converters to other formats, low speed, and the inability to use the model on a GPU. These are the limitations of the new Torch exporter. Therefore, there is no benefit in exporting through it.

K-prog commented 3 months ago

Thanks for the clarification @OPHoperHPO, I used the same preprocessing/postprocessing functions as provided in your conversion notebook for the onnx inference, as for the pytorch inference, I followed Sanster's lama cleaner code as its more clear and well structured. I already resized my images to (512,512) to remove any bias from pytorch's dynamic inference.

As far as I experimented with the ONNX implementation with multiple images, it seems that it lacks in preserving the structures, sharing an example with both pytorch and onnx results here.

I'll also be drafting you an email with more context and information.

OPHoperHPO commented 3 months ago

@K-prog

Interestingly, I checked the original code (without exporting to ONNX) and couldn't get a result close to yours using Torch. The quality I achieved was similar to ONNX— here is a Colab notebook demonstrating this. The issue is not with ONNX since the original code gives the same result. The problem lies with the operations in the ExportLama layers, the dependency versions, or the preprocessing operations. Could you provide a Colab notebook similar to this one, where the code for working with the original model is isolated from external factors and produces your torch result with such a mask?

K-prog commented 3 months ago

@OPHoperHPO, I have created an isolated notebook and removed all the external factors(courtesy here). I also tried replacing just the inference part via onnx and keeping the same preprocessing logic but it gives weird outputs.

Also, I have sent you and email at farvard34@gmail.com, waiting for your acknowledgement!

OPHoperHPO commented 3 months ago

@K-prog

but it gives weird outputs.

  inpainted_image /= 255.0

I checked your notebook and noticed some errors in the ONNX postprocessing. You just need to divide the output by 255, as the ONNX model multiplies it. This will give you the same results as Torch LaMa. So, it's not an issue with the ONNX model — it's a preprocessing issue that causes the difference in results. See this corrected version.

K-prog commented 3 months ago

@OPHoperHPO

Haah, thanks catching my little blunder ;-; The onnx model is indeed giving similar outputs 🙌

Big big thanks for your help!

ljdang commented 1 month ago

@OPHoperHPO It works well with fp32, but fp16 sometimes produces completely black results, possibly due to numerical overflow. Is there a way to solve?

K-prog commented 1 month ago

@ljdang fp16 doesn't seem to have any visible losses, is your conversion correct?

ljdang commented 1 month ago

@K-prog I use MNN to convert the model to FP16 format, just like what was mentioned in this issue. Only some images have issues. https://github.com/alibaba/MNN/issues/2977