NVlabs / NVAE

The Official PyTorch Implementation of "NVAE: A Deep Hierarchical Variational Autoencoder" (NeurIPS 2020 spotlight paper)
https://arxiv.org/abs/2007.03898
Other
999 stars 163 forks source link

Little question about lmdb_datasets.py implement #6

Closed Lukelluke closed 3 years ago

Lukelluke commented 3 years ago

Nice to meet you again ! Dr.@arash-vahdat ,

  1. Here is a little question about line 43

What does target = [0] mean ? I can't figure out the place where this item used. And only find the returned img data is fed into the NVAE model.

  1. How to apply channel == 1 data to NVAE model ? I still can't find where the parameters denote the cin data channels. Or there don't need extra operations and Model can fit itself ? Still in confuse, beg your kindly help ! Sincerely !

Hoping to get your help, Thank you again!

All the best,

Luke Huang

kaushik333 commented 3 years ago

Hi @Lukelluke

  1. I think they are just dummy labels being assigned to the dataset. A more generic framework of dataloader which gives you the (image, label) pair. If you look at Line 146, only the data is being used and not the label. And I dont see it being used in evaluate.py or the test() function too.

  2. To use 1 channel data, these are the changes I did. a. Add a separate elif case for your data class in https://github.com/NVlabs/NVAE/blob/master/datasets.py Add a "greyscale=True" parameter to the LMDBDataset class. Change this to

     if not self.greyscale:
         img = img.convert('RGB')
     else:
         img = img.convert('L')

    b. Change this to

      Cin = 1 if self.dataset in {'mnist','yourDataName'} else 3

    c. Change this to

     C_out = 1 if self.dataset in {'mnist','yourDatasetName'} else 10 * self.num_mix_output

    d. Since you're using grayscale images, change this to

     if self.dataset in {'mnist', 'yourDatasetName'}:

    or you can also NOT use bernoulli dist and use the mixture of dist instead.

@arash-vahdat please feel free to add anything else if you feel is important.

arash-vahdat commented 3 years ago

@kaushik333 Your answer is very complete and correct. Thanks!

Lukelluke commented 3 years ago

Thank you soooooooooooooo much! Dear @kaushik333 and Dr.@arash-vahdat !

Thank you for your quickly help!

I will follow your tutorial to practice right now.

Ps. Actually, I'm trying my dataset in .wav with mono, which channel==1 . And I get so much inspirations from your timely help, and as for dataset.py, there I did some changes in another way to fit it.

As for decoder_output , I need to take some more time to figure out how Bernoulli and DiscMixLogistic work.

All in all, thank you very much for your generous help ! Hope that I can become better in coding like you :) !

If it succeeds, I will give you good news as soon as possible and release related implement.

All the best,

Luke Huang

Lukelluke commented 3 years ago

Hi, dear @kaushik333

I did as your help. Say thank you again sincerely ! And this help me understand NVAE better !

During this period, there still a big question hang over my head:

Ps. This doubt is derived from audio field, where we usually turn audio spectrum to [batch, channel=1(mono), FRAME, Dimension of spectrum]. Where we usually make Dimension==80, however, frames(which denotes the length of one .wav field), is always != Dimension.

Hope to get any inspiration from IMG field, just as the 'Channel Problem' that you teach me above.

Please feel free to teach me anything, important or not important all is well !

Again to express my most sincere thanks to you !

All the best,

Luke Huang