didriknielsen / pixelcnn_flow

Code for paper "Closing the Dequantization Gap: PixelCNN as a Single-Layer Flow"
MIT License
18 stars 0 forks source link

Extending to Image Transformer #1

Open alexmathfb opened 3 years ago

alexmathfb commented 3 years ago

Thanks for this very important work.

I'm trying train Image Transformer as one layer flow and have a few questions I hope you can help me with.

In section 4 you describe how to make PixelCNN++ and related models (including Image Transformer) into a single-layer autoregressive flows.

Question 1. Is it correct that the modifications to be done for PixelCNN++ and Image Transformer are the same because both use DMOL?

Question 2. The below comment states that the PixelCNN++ is raw copy. If I am to extend Image Transformer, can I also just make a raw copy? https://github.com/didriknielsen/pixelcnn_flow/blob/9030f6a66d5ff83d7d299541ed55b20b20bb9a15/pixelflow/networks/autoregressive/pixelcnn_pp.py#L7

Question 3. It seems to me that the AutoregressiveSubsetFlow2d class does not assume PixelCNN++ and thus may work for ImageTransformer. In principle, if I change the following code to use Image Transformer, should it work?

https://github.com/didriknielsen/pixelcnn_flow/blob/9030f6a66d5ff83d7d299541ed55b20b20bb9a15/experiments/train/exact_pixelcnn_pp.py#L63

didriknielsen commented 3 years ago

Hi and thanks for your interest!

Q1: Yes, for an Image Transformer with DMOL, the setup is the same. Only the neural architecture that parameterizes the flow will be different.

Q2: Yes, that should be fine if you have an implementation of the neural architecture using in the Image Transformer.

Q3: Yes, by passing in the Image Transformer NN as net everything should still work fine.

alexmathfb commented 3 years ago

You are indeed correct, I managed to get very similar bpd early in training. Will comment tomorrow when training finish.

(image below is bpd loss of Image Transformer trained autoregressively or as single layer normalizing flow)

image

alexmathfb commented 3 years ago

The training loss curves seem indistinguishable.

image