Closed terrainer closed 5 months ago
The result is that any model trained with img_range 255 is using vastly blown-up values, causing massive instability and inaccuracy in training.
Would changing this make the official models incompatible? Would there be any way to fix those if so?
Changing this would break the official models, however my thoughts are that this could be worked around by running a check on the models weights, and then setting the img_range to 1 or 255 based on the max values. I'm not familiar with what can and cannot be read, or what format the data is in when it comes to loading the models however.
Okay, the issue isn't that SPAN is "implemented incorrectly" but that the default hyperparameters are suboptimal.
Unfortunately, changing the default of img_range
is not an option for us unless we detect the value of img_range
from the state dict. (Breaking official models is a no-go.) As for detection: we can read everything. We have the tensors in memory, so we can read their values if we want. If you can come up with a way to reliably distinguish img_range=1.
and img_range=255.
then we can add that.
Otherwise, we'll have to change the arch itself a little. We can use the fact that official models only use img_range=255.
to our advantage. Basically, we can have a tensor of size 1 that just contains the value of img_range
. We then read the value of that tensor. If the tensor isn't present, then we know that it's an official model and assume img_range=255.
. This strategy require that we coordinate with neosr.
I believe @muslll said that if he were to change the default image range, he would add a parameter to help us out with detection. So we could suggest to him that he just add the range as the value stored in that tensor.
Okay, the issue isn't that SPAN is "implemented incorrectly" but that the default hyperparameters are suboptimal.
Unfortunately, changing the default of
img_range
is not an option for us unless we detect the value ofimg_range
from the state dict. (Breaking official models is a no-go.) As for detection: we can read everything. We have the tensors in memory, so we can read their values if we want. If you can come up with a way to reliably distinguishimg_range=1.
andimg_range=255.
then we can add that.Otherwise, we'll have to change the arch itself a little. We can use the fact that official models only use
img_range=255.
to our advantage. Basically, we can have a tensor of size 1 that just contains the value ofimg_range
. We then read the value of that tensor. If the tensor isn't present, then we know that it's an official model and assumeimg_range=255.
. This strategy require that we coordinate with neosr.
My bad, should have said that the official basicsr code was implemented incorrectly. Technically, the chaiNNer code is correct. I can say for certain that with img_range: 255, the output from self.upsampler(out) is in a range of ~[-127.5, 127.5], and img_range: 1 has a range of [-0.5, 0.5]. Is it possible to just use a greater than check on it?
I can say for certain that with img_range: 255, the output from self.upsampler(out) is in a range of ~[-127.5, 127.5]
That doesn't sound right. I'm assuming you mean this line: https://github.com/chaiNNer-org/spandrel/blob/689385114f377bcb29116e80b76b8e570856d772/src/spandrel/architectures/SPAN/arch/span.py#L285
If so, then it 100% will not output in the range [-127.5, 127.5]. The output
is returned as is as the final result of the model, so if output
wasn't [0, 1], then no official SPAN models would work.
For the official models, I can assure you that self.upsampler(out)
will return values in the range [0, 1].
I can say for certain that with img_range: 255, the output from self.upsampler(out) is in a range of ~[-127.5, 127.5]
That doesn't sound right. I'm assuming you mean this line:
If so, then it 100% will not output in the range [-127.5, 127.5]. The
output
is returned as is as the final result of the model, so ifoutput
wasn't [0, 1], then no official SPAN models would work.For the official models, I can assure you that
self.upsampler(out)
will return values in the range [0, 1].
You're absolutely right, I remembered incorrectly. It's out_feature = self.conv_1(x)
that outputs that range
It's
out_feature = self.conv_1(x)
that outputs that range
I did some tests, and confirmed that out_feature
is not in the range [0, 1]. But that's all. The range seems to vary wildly between models and images. I've seen [-2, 5], [-8, 10], [-32, 30], and even [-20, 50]. It's not consistent at all, which I would honestly expect from a feature mapping
But that's not enough for detection. We would also need to confirm that the range of out_feature
with img_range=1.
is what you claimed. Do you have any such models I can test?
Hi, as proposed on the neosr issue, I think a simpler solution would be to make a new param called norm
(bool), which disables normalization altogether. I don't know how detection would work regarding this, but I imagine it would be easier?
Here are two models for testing, one with normal code set to img_range=1.0
and other with the proposed change norm=False
:
span_models.zip
SPAN is using incorrect parameters that were moved over from EDSR without consideration of the way BasicSR works.
The img_range value is set to
255.
and should be set to1.
as is done by all archs using the same code. With its current value, SPAN is:Taking your input image with range [0, 1]
Zero-centering it with (x - self.mean), so now the input range is near to [-0.5, 0.5]
Multiplying it by 255, leaving you with an input with a range of approximately [-127.5, 127.5].
The reason the code uses
255.
instead of1.
is that, as is evident from the use of a ripped and unedited options file for EDSR instead of their own training options, they've blindly copied the values from EDSR, which used this MeanShift code, that is vastly different from the code in SPAN, and also fails to account for the fact that BasicSR normalizes inputs to [0, 1], unlike EDSR.