dobkeratops / convnet_stuff

2 stars 2 forks source link

The currently added noise is too much? Is it deliberate? Final output - noise? #2

Open Twenkid opened 1 year ago

Twenkid commented 1 year ago

The amount of noise which is added currently seems too much for me, especially as it's color noise. (I guess you want it to be too much in order to see the result?)

I think the noise should be normalized/relative to the intensities in the input and/or the input to be normalized with CLAHE etc. (histogram equalization), otherwise the dark images are ruined.

image image

It is also strange that the output final looks like that? (noise) It needs a lot of epochs to start shaping?

image

image

image

dobkeratops commented 1 year ago

right the noise is pretty extreme, it's an experiment to do this broader noise rather than the usual straight per-pixel noise, although it is inline with the ideas like the rectangular holes you showed in the other thread

It definitely needs to be configurable. Also may be better to randomise the amount of noise itself. (Could even add a branch to make the net guess: "how much noise was added?"). I should make it randomise the shape of the noise aswell. This fractal style noise also stops it learning tree & cloud detail, because those naturally look like the noise (unintended consequence).

as for taking a lot of epochs to start shading - indeed. that is disconcerting because you see the loss stalled for a long time (you'd lose faith that it's actually working). but if you squint you can start to see feint patterns emerging in the flat state . it does 'break through' eventually.

What I'd be happiest with is a procedure where layers are added gradually, where it learns the lower ones first more quickly, making themselves visible sooner.

I actually had this setup (in the codebase you'll see some use of a "self.uselevel" maybe commented out) - and it was working ok to a point but I wasn't happy with needing some arbitrary loss value at which it will start increasing. I needed a smarter way to do that (like calculating the slope of loss .. "has progress stalled") Also it wasn't quite reaching the final level. there's probably a heuristic and it would help to jitter ,eg instead of "uselevel=N", randomly alternate N,N+1. I need to cleanup how it determines the loss in that scenario though .

There is also a difference between training lower layers and adding more, vs training all the layers to ensure the lowest ones learn whats most useful to the final ones (I read that they did used to train deep nets layer by layer first, but stopped doing that once they figured out ReLU because it gave higher scores)

I might also just make it show all the levels in that preview. left=inpujt, then level0 prediction, level1 prediction.. level Nprediction, then Target. (or perhaps cascade them?)

So I've switched to it currently outputting a shortcut in the middle and the whole net, all the time. The hope was the shortcut would learn faster , but even that doesn't.

dobkeratops commented 1 year ago

(my thinking is currently the most important next step is integrating VGG weights somehow, either train it to make the same result as VGG or just find a VGG based autoencoder, and move onto more useful training. I'm sure there's plenty of ways to compress VGG aswell. this is really practice for me)

Twenkid commented 1 year ago

There is also a difference between training lower layers and adding more, vs training all the layers to ensure the lowest ones learn whats most useful to the final ones (I read that they did used to train deep nets layer by layer first, but stopped doing that once they figured out ReLU because it gave higher scores)

Yes, reg. the difference, if you train only the lower layers and the higher are not present, that's another model, it's like training a more shallow network; it couldn't achieve the reconstruction results of the whole network, and (in the other cases too) it has to switch when it discovers it has stalled or something; or maybe doing experiments - checking how it changes when it adds another layer, compared to the current depth.

Freezing some of the layers is used for finetuning, I've seen that with the higher levels though, as with VGG, using the pretrained lower levels and when training the higher ones change.

While the lower levels tend to discover generic patterns (the filters look like gradients, checkboard patterns, edges etc.), from some point on or for particular use-case, such as retro-games with their specific visuals (fixed color palette) etc., training our own "VGG" may happen to be more efficient and to achieve better results than the generic one, the generic is trained on photographs.

I've been thinking also of employing transformers, I still need to leverage it technically, maybe it would be too much work or maybe it won't be fast enough, but I'm thinking of encoding images as codes ("tokens"), training like that, and then predicting tokens and rendering them like tiles/blocks, which could be modified, upsampled individually or so (if applied just like that it would have its own side effects/possible jaggies/blockiness, requiring other smoothing etc.). Or after first rendition - applying something else etc. It may be not appropriate for that particular use case, but is interesting in general. Etc.

It definitely needs to be configurable. Also may be better to randomise the amount of noise itself. (Could even add a branch to make the net guess: "how much noise was added?"). I should make it randomise the shape of the noise aswell. This fractal style noise also stops it learning tree & cloud detail, because those naturally look like the noise (unintended consequence).

I like that with the guessing of the amount (maybe also the type/class) of noise. That case with tree & cloud is a nice observation, an empirical proof of something that one can speculate about: the autoencoder doesn't record exactly, it reconstructs and captures similarity, thus it would remember what's similar to the input etc., in that case it suppresses what's similar.

What I'd be happiest with is a procedure where layers are added gradually, where it learns the lower ones first more quickly, making themselves visible sooner.

Training more shallow nets may be useful for showing where they can reach and compare the quality. Also, for collecting them as a meta dataset, if trained on the same image set and for "knowledge distillation", training a student network on the teacher network.