NVlabs / stylegan3

Official PyTorch implementation of StyleGAN3
Other
6.3k stars 1.11k forks source link

Is it a valid format of dataset.json? #149

Open Dok11 opened 2 years ago

Dok11 commented 2 years ago

I have images with two continuous features. First f is from 1.0 to 25.0 and second r 0.0 to 180.0. And I create dataset.json file with format as this:

{
  "labels": [
    [
      "f[1.0]_r[0].png",
      [
        1.0,
        0.0
      ]
    ],
    [
      "f[1.0]_r[100].png",
      [
        1.0,
        100.0
      ]
    ]
  ]
}

Is it correct way? Or for this dataset I should change source code or just dataset.json format?

UPD: Seems like it was correct. The training results with this dataset.json looks much better, than when I tun training process without any labels on the same dataset on the same training duration.

PDillis commented 2 years ago

Will you share both how your dataset.json inside the .zip file looks like (e.g., same as above with two samples), as well as how your models compare? Conditional models are not really done or shared, so it's always interesting to see the results from everyone else.

Dok11 commented 2 years ago

@PDillis I begin my acquaintance with technology from simple synth dataset (from Blender). It have images like this: image

Here we have two main feature:

  1. Frequenecy. When it equal to one we have one black and one white line, and for 30 — 30 black and 30 white lines.
  2. Rotation from 0 to 180 (degree).

Than I try to run StyleGAN3 training without dataset.json with command python .\train.py --outdir=./training-runs --cfg=stylegan3-r --data=./data/checker/ --batch=64 --gpus=1 --gamma=0.5 --metrics=none --cbase=4096 --cmax=64 --tick=1 --snap=10.

After 1h 40minutes I has chaos like this (it's a crop from whole image): fakes000487 Good result, but pretty chaotic.

Than I make dataset.json with structure like in first message, and run it python .\train.py --outdir=./training-runs --cfg=stylegan3-r --data=./data/checker/ --batch=64 --gpus=1 --gamma=0.5 --metrics=none --cbase=4096 --cmax=64 --tick=1 --snap=10 --cond=True.

After 1h 40minutes I has result like this: fakes000491

Thats look like much better result and much consistent. For more correct comparison I should to turn off all augs, but I forgot about this defaults. Anyway, obviuosly, given features change the result to the best direction.

Will you share both how your dataset.json inside the .zip file looks like

My dataset contains 52 671 png images, so size of them is 300+ MB. But if you need just an example I attach the archive with some set of this data: checker.zip

as well as how your models compare?

I don'h have hard skills in statistic, so I just see results from different runs of model, it's enough for me :) But I see here "Spectral analysis" tool, maybe it may be helpful to compare results with objective values. I dont know for now.

Conditional models are not really done or shared, so it's always interesting to see the results from everyone else.

The training of such large models requires a lot of computation. So I have hope — if we provide some valuable labels about our dataset, then model can faster grokking relatives of image features. And probably it will takes less time before we got acceptable result. Now I work with 360 panorama, and with "labelstud.io" tool I mark images by side-by-side method, it gives continues features/labels about images for whole dataset. I marks high level features such "day/night", "how many buildings", "how many nature" and other that have influence to whole image or big part of it.

Before I did it without labels on LightweightGAN by lucidrains and result was not too good — http://postoev.ru/this-place-does-not-exist/?q=5&i=78. Now I want to try work with this dataset on StyleGAN3, and I belive — if I provide high level labels to images, then I can expect the best result. But before I will work with the hard dataset I should check pipeline on this zebra's dataset, like above =)

Dok11 commented 2 years ago

I have same test with panoramas, they have 7 labels:

{
  "labels": [
    [
      "1000348558.jpg",
      [
        0.5520296692848206,
        0.4297911822795868,
        0.6073692440986633,
        0.03547428548336029,
        0.25489330291748047,
        0.3101881146430969,
        0.611669659614563
      ]
    ],
    [
      "1000387902.jpg",
      [
        0.6426243782043457,
        0.6227445006370544,
        0.611226499080658,
        0.17565134167671204,
        0.3785965144634247,
        0.06725947558879852,
        0.29489609599113464
      ]
    ]
  ]
}

It's value for:

Than I try to learn model without dataset.json, and I have result after 200kimg: image

And for comparison 200kimg, when we use dataset.json as described above: image

The quality seembs be equal, but second result more consistent and probably have more control to generate target image.

Of course, for this purpose I should use augs like offsetX, because panoramas are seamless by left and right edges. It's for future experiments.

PDillis commented 2 years ago

Thanks for the replies! The model seems to have lost diversity in favor of stability. Have you continued to train it further? Your labels are also interesting, as I've normally seen some that are used as 1 or 0, not as floats. If your method works, then this changes how these models might be trained! That is, instead of saying that an image is of class 20, it has different properties, so you can make really interesting class interpolations by keeping the latent fixed.

Dok11 commented 2 years ago

The model seems to have lost diversity in favor of stability.

I think it's just a trouble of generator that makes fakes.jpeg. I have plan to improve this part part and see all diversity of results — each feature with full range (0-1).

Have you continued to train it further?

No, it was just a test in 128×128. First I should to implement offsetX aug, it's required for my task, second — improve generation of fakses.jpeg, and than I will train it in 128 for future test and 512 for real usage. Currently I work with labels, they marks have pretty hard multistep pipeline :)

If your method works, then this changes how these models might be trained! That is, instead of saying that an image is of class 20, it has different properties, so you can make really interesting class interpolations by keeping the latent fixed.

It's main idea of my tests. And it's important for my tasks and pretty small dataset (6500 panoramas). Now I remember what I had plan to make "zebra" by demand, like "give me image with frequency 5 and rotate 45 deg". I think these labels should be just a some area in the input latent space thats in main usage just a random vector.

Dok11 commented 2 years ago

Now I remember what I had plan to make "zebra" by demand, like "give me image with frequency 5 and rotate 45 deg".

Thats interesting. The training loop is provide the ability to train with continius labels. But the gen_images.py is not. You can pass --class but it should be an one-hot class like [0, 0, 1, 0, 0, 0]. For example, when you run command with --class=2 the G argument gives it as label [0, 1].

So I had to change this script to this:

# custom continius labels, not one-hot encoded
my_label = torch.tensor([[0.98866, 0.5658]], device=device)
img = G(z, my_label, truncation_psi=truncation_psi, noise_mode=noise_mode)
#          ^ use custom labels insted of class

As I can see, the model brokes on the high frequency pattern like this: f 29 3 _r 50

I take the good working model before collapse and generate some image patterns with it. You can see them here http://postoev.ru/this-zebra-does-not-exist Important note: I trained the model in ranges 0-30 for lines frequency and 0-180 degree. So it's pretty interesting to see something out of this range, just drag a slider to the left half.