google-ai-edge / ai-edge-torch

Supporting PyTorch models with the Google AI Edge TFLite runtime.
Apache License 2.0
375 stars 51 forks source link

Conversion fails on custom models (DCGAN, pix2pix) #295

Open jchwenger opened 1 month ago

jchwenger commented 1 month ago

Description of the bug:

When adapting the official PyTorch to TFLite quickstart (here) in this Colab, I'm encountering two failures (for a DCGAN, and a pix2pix model), whereas a plain dense net succeeds (on top of the official ResNet example).

It's unclear to me where the source of the problem lies. Opening this new issue to stop hijacking this one (see discussion below this point).

Thanks in advance @pkgoogle for your time!

Actual vs expected behavior:

ResNet and Dense yield:

"Inference result with Pytorch and TfLite was within tolerance"

whereas with the DCGAN and pix2pix models it fails, with

"Something wrong with Pytorch --> TfLite"

It would be great if those models could be converted...

Any other information you'd like to share?

No response

pkgoogle commented 1 month ago

Hi @jchwenger, I was able to replicate your results with your colab. I do want to note that we do not guarantee output results within atol=1e-5 & rtol=1e-5 for an entire model all the time. So it'll be interesting to see what accuracy these 2 edge models will achieve on some common datasets/benchmarks. That being said, the range difference in output values does appear to be degenerate somehow.

jchwenger commented 1 month ago

Hi @pkgoogle, thanks for this! When I printed the min and max, they felt pretty different from the original, but in fairness I haven't tried testing the converted models to see how well they work. I'll try that and get back to you.

iliaparvanov commented 3 weeks ago

I can confirm the same problem with CycleGAN (specifically this implementation https://github.com/aitorzip/PyTorch-CycleGAN). Nevertheless, when I compare results from the original and the exported model side by side, I cannot see a visual difference in the output, unless I use a tool like Beyond Compare. So, for my use case, the results are good enough.

Unfortunately, I cannot post examples here because the data I am using is proprietary. If I have some free time, I will train on a common dataset and return with concrete results.