chaiNNer-org / spandrel

Spandrel gives your project support for various PyTorch architectures meant for AI Super-Resolution, restoration, and inpainting. Based on the model support implemented in chaiNNer.
MIT License
105 stars 7 forks source link

SPAN implemented incorrectly #145

Closed terrainer closed 5 months ago

terrainer commented 5 months ago

SPAN is using incorrect parameters that were moved over from EDSR without consideration of the way BasicSR works.

The img_range value is set to 255. and should be set to 1. as is done by all archs using the same code. With its current value, SPAN is:

The reason the code uses 255. instead of 1. is that, as is evident from the use of a ripped and unedited options file for EDSR instead of their own training options, they've blindly copied the values from EDSR, which used this MeanShift code, that is vastly different from the code in SPAN, and also fails to account for the fact that BasicSR normalizes inputs to [0, 1], unlike EDSR.

terrainer commented 5 months ago

The result is that any model trained with img_range 255 is using vastly blown-up values, causing massive instability and inaccuracy in training.

joeyballentine commented 5 months ago

Would changing this make the official models incompatible? Would there be any way to fix those if so?

terrainer commented 5 months ago

Changing this would break the official models, however my thoughts are that this could be worked around by running a check on the models weights, and then setting the img_range to 1 or 255 based on the max values. I'm not familiar with what can and cannot be read, or what format the data is in when it comes to loading the models however.

RunDevelopment commented 5 months ago

Okay, the issue isn't that SPAN is "implemented incorrectly" but that the default hyperparameters are suboptimal.

Unfortunately, changing the default of img_range is not an option for us unless we detect the value of img_range from the state dict. (Breaking official models is a no-go.) As for detection: we can read everything. We have the tensors in memory, so we can read their values if we want. If you can come up with a way to reliably distinguish img_range=1. and img_range=255. then we can add that.

Otherwise, we'll have to change the arch itself a little. We can use the fact that official models only use img_range=255. to our advantage. Basically, we can have a tensor of size 1 that just contains the value of img_range. We then read the value of that tensor. If the tensor isn't present, then we know that it's an official model and assume img_range=255.. This strategy require that we coordinate with neosr.

joeyballentine commented 5 months ago

I believe @muslll said that if he were to change the default image range, he would add a parameter to help us out with detection. So we could suggest to him that he just add the range as the value stored in that tensor.

terrainer commented 5 months ago

Okay, the issue isn't that SPAN is "implemented incorrectly" but that the default hyperparameters are suboptimal.

Unfortunately, changing the default of img_range is not an option for us unless we detect the value of img_range from the state dict. (Breaking official models is a no-go.) As for detection: we can read everything. We have the tensors in memory, so we can read their values if we want. If you can come up with a way to reliably distinguish img_range=1. and img_range=255. then we can add that.

Otherwise, we'll have to change the arch itself a little. We can use the fact that official models only use img_range=255. to our advantage. Basically, we can have a tensor of size 1 that just contains the value of img_range. We then read the value of that tensor. If the tensor isn't present, then we know that it's an official model and assume img_range=255.. This strategy require that we coordinate with neosr.

My bad, should have said that the official basicsr code was implemented incorrectly. Technically, the chaiNNer code is correct. I can say for certain that with img_range: 255, the output from self.upsampler(out) is in a range of ~[-127.5, 127.5], and img_range: 1 has a range of [-0.5, 0.5]. Is it possible to just use a greater than check on it?

RunDevelopment commented 5 months ago

I can say for certain that with img_range: 255, the output from self.upsampler(out) is in a range of ~[-127.5, 127.5]

That doesn't sound right. I'm assuming you mean this line: https://github.com/chaiNNer-org/spandrel/blob/689385114f377bcb29116e80b76b8e570856d772/src/spandrel/architectures/SPAN/arch/span.py#L285

If so, then it 100% will not output in the range [-127.5, 127.5]. The output is returned as is as the final result of the model, so if output wasn't [0, 1], then no official SPAN models would work.

For the official models, I can assure you that self.upsampler(out) will return values in the range [0, 1].

terrainer commented 5 months ago

I can say for certain that with img_range: 255, the output from self.upsampler(out) is in a range of ~[-127.5, 127.5]

That doesn't sound right. I'm assuming you mean this line:

https://github.com/chaiNNer-org/spandrel/blob/689385114f377bcb29116e80b76b8e570856d772/src/spandrel/architectures/SPAN/arch/span.py#L285

If so, then it 100% will not output in the range [-127.5, 127.5]. The output is returned as is as the final result of the model, so if output wasn't [0, 1], then no official SPAN models would work.

For the official models, I can assure you that self.upsampler(out) will return values in the range [0, 1].

You're absolutely right, I remembered incorrectly. It's out_feature = self.conv_1(x)that outputs that range

RunDevelopment commented 5 months ago

It's out_feature = self.conv_1(x)that outputs that range

I did some tests, and confirmed that out_feature is not in the range [0, 1]. But that's all. The range seems to vary wildly between models and images. I've seen [-2, 5], [-8, 10], [-32, 30], and even [-20, 50]. It's not consistent at all, which I would honestly expect from a feature mapping

But that's not enough for detection. We would also need to confirm that the range of out_feature with img_range=1. is what you claimed. Do you have any such models I can test?

muslll commented 5 months ago

Hi, as proposed on the neosr issue, I think a simpler solution would be to make a new param called norm (bool), which disables normalization altogether. I don't know how detection would work regarding this, but I imagine it would be easier? Here are two models for testing, one with normal code set to img_range=1.0 and other with the proposed change norm=False: span_models.zip