Restore training conditionals for PLKSR, RealPLKSR, SPAN

the-database commented 1 week ago

I'm importing spandrel's architecture definitions to train models in traiNNer-redux, and noticed that a few of the architectures in spandrel removed some conditionals involving self.training. This PR restores those conditionals. I checked all archs in spandrel against their official versions and these are the only ones I found which were missing self.training conditionals.

Note that the change to SPAN isn't part of the official code, but training with the official SPAN code results in this issue: https://github.com/hongyuanyu/SPAN/issues/4 The change appears to be a safe fix, since the logic is unchanged in inference mode, models trained with this change are still compatible with this arch, and all tests are passing. Let me know if this change needs to be backed out.

umzi2 commented 1 week ago

How do you like the idea of cleaning up the span a little at the same time? not used: https://github.com/the-database/spandrel/blob/restoretraining/libs/spandrel/spandrel/architectures/SPAN/arch/span.py#L250 https://github.com/the-database/spandrel/blob/restoretraining/libs/spandrel/spandrel/architectures/SPAN/arch/span.py#L219 if we remove act2 then we don’t need this code https://github.com/the-database/spandrel/blob/restoretraining/libs/spandrel/spandrel/architectures/SPAN/arch/span.py#L28

https://github.com/the-database/spandrel/blob/restoretraining/libs/spandrel/spandrel/architectures/SPAN/arch/span.py#L233 there is no point in returning att since it is not used further and if we do not return it it will not affect compatibility

umzi2 commented 1 week ago

https://github.com/the-database/spandrel/blob/restoretraining/libs/spandrel/spandrel/architectures/SPAN/arch/span.py#L209 and all these mid_channels out_channels are not used in all blocks, only in_channels are transmitted

RunDevelopment commented 5 days ago

I think we first have to answer the question whether spandrel arches should include training code.

Please consider:

Some official arches used clusters to train their models, and I removed all the training networking code. If that was to be restored as well, spandrel would have to take on new dependencies.
Similarly, many arches use omegaConf to store training configs. I removed all of that as well, so spandrel doesn't depend on omegaConf.
The training code might make loading models slower for "regular" users. Some arches do pretty involved weight initialization, and I removed that for some arches.
Similarly, we might choose to remove weight initialization from all model in the future to make loading models faster. I haven't tested whether that's a lot faster, though.

joeyballentine commented 5 days ago

My personal opinion:

1) we should not include any opinionated training code, i.e. the cluster stuff you mentioned, training configs, discriminators, weight initializations, etc. 2) we should include code that is necessary for the arch to train properly, i.e all of the changes presented here. This stuff is behind the self.training if statement and therefore wont get used during actual inference.

Auto's use-case is using spandrel for all relevant arch code in traiNNer-redux, which gets us guaranteed compatibility between trained models and spandrel due to the coupling. This isn't possible if the model arch needs to be modified for something training specific.

These things are essentially part of the architecture code that we got rid of because we weren't using this for training, but that doesn't mean nobody is going to want to. And we broke the archs because of that.

RunDevelopment commented 2 days ago

@joeyballentine Sorry for wording my question in a confusing way. My question wasn't what use cases there are for including training code, but whether supporting training models is part of the scope/feature set of spandrel.

If we want to support training, then we also have to do it properly. Right now:

None of the training code is tested.
It isn't documented which architectures support training.
There is no stable API for training. Remember, the modules, classes, and class names of architectures (e.g. the SRVGGNetCompact class for Compact) are explicitly NOT part of the public API and may change at any time.

While it isn't feasible to test actual training in CI, we have to at least test/verify the stability of the public API. Otherwise, we're just going to break someone's code in an update.

joeyballentine commented 2 days ago

Tbh I really don't think it's that big of a deal and I think you're overthinking this.

I don't necessarily think we should advertise training as a feature. Spandrel is an inference library first and foremost. I just don't think we should purposefully break the ability for people to train only certain archs using spandrel as a backend, if they know what they're doing and know how to do that.

joeyballentine commented 2 days ago

And in terms of us "breaking training" in the future, the only way we could possibly do that is by getting rid of arch code like we did. But it's not like we're going to be doing that all the time with existing arches. We'd only be doing that with new ones, which we can then fix later if reported untrainable

RunDevelopment commented 2 days ago

Joey, I'm not talking about advertising training to normal users. I'm talking about API stability.

Right now, not even importing arch classes is part of the stable API. E.g. this is the stable API for DAT and this is what trainner-redux does to import DAT. What trainner-redux does currently only works, because Python doesn't have a way to limit the scope of import symbols, so those classes get leaked.

And in terms of us "breaking training" in the future, the only way we could possibly do that is by getting rid of arch code like we did.

It's literally trivial to break training right now. Rename an arch class, rename an arch class argument, remove an unused **kwargs, the list goes on. I mentioned the last one, because this is a change I did before.

joeyballentine commented 2 days ago

I guess I still just don't see what the issue is. We can make breaking changes for anything at any time, why is this different? It's why we have versioning.

I just don't see why we can't just at least merge this in order to make this use case work. You haven't given me a good reason other than "we might break it in the future" which as long as database doesn't mind, I don't see as a problem.

RunDevelopment commented 2 days ago

We can make breaking changes for anything at any time, why is this different? It's why we have versioning.

Joey, what is a breaking change? The answer is: "anything that breaks API guarantees." There are no breaking changes when you don't have an API. This is why e.g. ESLint has a huge list of rules for what changes constitute which version bump (patch, minor, major). The whole point of semantic versioning is that version bumps have meaning that relate to guarantees. Anything that isn't guaranteed doesn't even constitute a patch.

I just don't see why we can't just at least merge this in order to make this use case work.

Similarly, anything that isn't guaranteed also means that it can't be broken/incomplete. So what's the point of this PR if we don't support training?

You haven't given me a good reason other than "we might break it in the future" which as long as Auto doesn't mind, I don't see as a problem.

Inconsistency. Some archs support training, some don't.
Impossible to test.
Relies 100% on internal implementation details.

Plus, "we might break it in the future" isn't what I was talking about. My point is that "we might break it in the future and not even know about it." If you can't even know whether a feature works, then you are just blindly walking in the dark hoping that nothing bad happens. But here it's even worse, because you don't even know whether just accessing a feature works. This is just going to cause future headaches for everyone.

And at the risk of repeating myself: My point isn't that spandrel shouldn't support training, but that if we want spandrel to support training, then we have to do it right.

This PR can be the start of this, but it can't be the end of it. This is why I want to hammer down what "spandrel supports training" even means.

joeyballentine commented 2 days ago

Inconsistency. Some archs support training, some don't.

Thats what this PR fixes

Impossible to test.

So?

Relies 100% on internal implementation details.

So does the rest of the arch code. It's quite literally just code that is an implementation detail that we have no clue how it works. We just wrap it in a cool way to make it work good

This is why I want to hammer down what "spandrel supports training" even means.

IMO, it means "you can import and arch from spandrel and train it". Simple as that. It does not mean we need to have any kind of training code whatsoever. Anyone writing a program for training already knows how to do that.

You know how musl keeps breaking models in neosr which makes models trained there incompatible with spandrel? We basically did the inverse by removing these bits of training code. This is code that needs to be there in order for the arch to be trained properly.

If we are going to say "spandrel is the source of truth for this arch for all trainers, and any changes you make are considered breaking changes to an arch", but the trainers cant even use the arch from spandrel, then that feels like a really silly thing for us to claim. How about we actually make the archs trainable, and that way someone like the-database can actually use this source of truth for training and make 100% compatible models.

RunDevelopment commented 2 days ago

Relies 100% on internal implementation details.

So does the rest of the arch code. It's quite literally just code that is an implementation detail that we have no clue how it works. We just wrap it in a cool way to make it work good

Different implementation detail. I'm obviously referring to our implementation details. E.g. our class names.

Inconsistency. Some archs support training, some don't.

Thats what this PR fixes

It literally doesn't. We already talked about some arches being designed to be trained in clusters or needing additional deps to train.

IMO, it means "you can import and arch from spandrel and train it". Simple as that.

See my previous sentence as to why "any arch" doesn't work. This is why we have to document this properly.

umzi2 commented 2 days ago

Insert your 2 cents, just because you use spandrel for training, does not mean that it will be compatible with it. In architectures, untracked parameters are often pulled out into dynamic parameters, and using spandrel will not affect compatibility in any way. The architecture can be made as compatible as possible by simply inserting the original code and disabling the use of some parameters. If the author of the training framework himself does not change the architecture, this will not affect compatibility

joeyballentine commented 2 days ago

We already talked about some arches being designed to be trained in clusters or needing additional deps to train.

If by training in clusters you mean like multi-gpu over network, that literally has nothing to do with the architecture code. Otherwise, I have no idea what this means. As I explained earlier, that kind of stuff that is irrelevant to the arch itself (you can train any network with multigpu/networking, you can use any config stuff you want for your training code, etc) and we don't need to care about it. I don't know why you're being so hung up about that stuff. It's like saying that since we based some of our filter code off of paint.net you're wondering if we need to include their entire GUI -- it's simply irrelevant for what we're using from the originals.

I'm obviously referring to our implementation details. E.g. our class names.

Sure. But I'm not sure how any kind of abstraction specifically meant for training would help in this regard anyway. regardless, to initialize the network you want, there will be some amount of needing to use some kind of arbitrary name of something. I get that you don't want our implementation details to be used like this, but no matter what there will be some level of "this is the name of the thing, maybe we'll change it maybe we won't but you gotta initialize it like x or y". That's just the nature of something like this. It's completely different from inference where we can auto detect the model. It's a different use case.

See my previous sentence as to why "any arch" doesn't work. This is why we have to document this properly.

Again, this is just wrong, assuming I understand what you mean by clustering and those extra deps. If I'm understanding incorrectly, please explain why those are necessary for the models to train correctly, i.e. why if someone does not have those things, the model will turn out broken.

Anyway, I'm pretty sure the-database is going to use spandrel regardless, so feel free to close this PR and he can just deal with adding back the span, etc code to his trainner. I just think this whole thing is ridiculous so I'm out. @the-database if you really want this in, please argue from your side about why you want this.

RunDevelopment commented 2 days ago

Okay, so I looked through some arches, and it seems like I misremembered. I 100% know that I removed the networking code from some arch (because I had to spend like an hour figuring out how it worked to remove it), so it must have been an arch that I ended up discarding, because it didn't work out.

So that's fine. Sorry for making a panic about this.

I'm obviously referring to our implementation details. E.g. our class names.

Sure. But I'm not sure how any kind of abstraction specifically meant for training would help in this regard anyway.

I'm not talking about abstractions or anything. It's way simpler. Right now, class names are not part of the public API. E.g.

from spandrel.architecutres.DAT import DAT

isn't guaranteed to work right now, because there might not be an import symbol called "DAT" under the "DAT" module. And no, you can't assume that an architecture named "FOO" will have a "FOO" class, because there are architectures will multiple classes (e.g. KBNet with KBNet_l and KBNet_s, RealCUGAN with UpCunet2x, UpCunet3x, UpCunet4x, UpCunet2x_fast, and so on).

Under what names should those classes be exported? Simple stuff like this is what I was talking about the entire time.

Same for the hyperparameters. I renamed/removed some hyperparameters of some architectures when I added them (e.g. some arches had unused parameters, and I removed **ignore_kwargs from all arches). Is that a problem? Am I allowed to make changes like this in the future?

This is what defining an API for architectures is all about. We make guarantees so that those things are no longer mere implementation details.

And we have to talk about these things, because I previously gave zero thought to these things, because they didn't matter.

So here are some concrete questions that have to be answered to define the API of the spandrel's arch classes:

What naming conventions should arch classes follow?
If the arch name and the name of the class are different (e.g. Comapct and SRVGGNetCompact) under what name should we export the arch class?
If there are multiple arch classes, what should their names be?
If there are multiple arch classes, should we provide a Union[...] type of all classes? We already define and use such a union type for many such arches (example). Should that be public? If so, under what name.
Should arch class parameter names be allowed to be removed/renamed?
Can params be reordered? (Basically, should we add a *, to all arch class constructors?)

joeyballentine commented 2 days ago

So that's fine. Sorry for making a panic about this.

All good

Under what names should those classes be exported? Simple stuff like this is what I was talking about the entire time.

Ah ok that makes sense. Stuff like ESRGAN being called RRDBNet? In that case, it probably does make sense to have some kind of way to import it as ESRGAN. I mean, we even directly import the arches in chaiNNer as well to do things like check to do isinstance checks, so there's probably something simple we could do there to just export each arch as its normal colloquial name.

Is that a problem? Am I allowed to make changes like this in the future?

If they aren't used, I don't see why not. Generally speaking i think its actually a better thing that we clean those up, since then its more clear what params actually can get used / changed for training

This is what defining an API for architectures is all about. We make guarantees so that those things are no longer mere implementation details.

Yeah that makes sense

And we have to talk about these things

Then yeah, we should probably have a longer conversation about it. My general thought though is just to do whatever is simplest for us to do while also allowing training. I don't really want to give us too much extra work either

And my thoughts on the rest of the questions:

What naming conventions should arch classes follow?

The colloquial ones

If the arch name and the name of the class are different (e.g. Comapct and SRVGGNetCompact) under what name should we export the arch class?

In this case, "Compact"

If there are multiple arch classes, what should their names be?

Not really sure tbh. But probably just the colloquial name + whatever extra thing. So KBNet_s would just stay that way, unless we want to write out what the S stands for? Maybe @the-database could weight in on how he'd plan on having users define that in training configs

If there are multiple arch classes, should we provide a Union[...] type of all classes?

Probably, but it also probably wouldn't matter if we didn't

Should arch class parameter names be allowed to be removed/renamed?

Now that you mention it, i almost wonder if we should have a somewhat consistent naming convention, like picking one of ("num_in_ch", "in_nc", "in_ch", etc) to use everywhere. But idk for sure.

Can params be reordered? (Basically, should we add a *, to all arch class constructors?)

IMO they should all be named so order doesn't matter.

RunDevelopment commented 16 hours ago

Alright, since we resolved that, let's merge this PR and continue with what needs to be done in follow-up PRs. I already started with #279.

chaiNNer-org / spandrel

Restore training conditionals for PLKSR, RealPLKSR, SPAN #276