didriknielsen / survae_flows

Code for paper "SurVAE Flows: Surjections to Bridge the Gap between VAEs and Flows"
MIT License
283 stars 34 forks source link

Low GPU usage when experiments on cifar10 #3

Closed gitlabspy closed 3 years ago

gitlabspy commented 3 years ago

Hi Didrik, Great work from you!! I was trying to run experiments from codes you've given. I run it on cifar10 and I found it has low GPU usage while it occupies a lot of GPU memory. I wonder what causes this phenomenon? 39'C, 35 % | 23651 / 32510 MB 35% is almost the highest usage rate.

Another thing is, I found that image is barely reconstructable. (cuz from what I understand, flow is bijective (obviously it could be other cases according to this paper) so it could kinda always recover image from latent and the performance is good even when multi-scale architecture's on. I only try on 32x32 size image, will it be good (more reconstructable) on higher resolution images?

I was running this python train.py --epochs 500 --batch_size 32 --optimizer adamax --lr 1e-3 --gamma 0.995 --eval_every 1 --check_every 10 --warmup 5000 --num_steps 12 --num_scales 2 --dequant flow --pooling max --dataset cifar10 --augmentation eta --name maxpool line of command and it runs about 300 epochs.

didriknielsen commented 3 years ago

Hi, Thanks!

I've also found GPU utilization to be low for flow models. I haven't investigated this further, but if someone has any insight, I'd love to hear it!

Regarding reconstruction: If you are using only bijections, the input should be exactly reconstructable (up to numerical error). However, if you make use of surjective transformations (such as Slice, MaxPool2d, etc.), some information in lost in the x->z direction which is generated again in the z->x direction. The reconstructions might therefore vary.

gitlabspy commented 3 years ago

@didriknielsen It's kinda rude for me to ask you for this but I wonder if you can upload the pretrained model forimagenet64please?