flarum / issue-archive

0 stars 0 forks source link

Scalability meta issue #121

Open luceos opened 3 years ago

luceos commented 3 years ago

This issue is meant to provide actionable tasks to improve scalability in specialised, highly volatile hosting environments.

askvortsov1 commented 3 years ago

the users.avatar_url string length is too short disallowing the use of CDN's

Wouldn't we want this to be the path relative to the CDN's root? That would give more flexibility in changing avatar filesystems.

On the other hand, storing the full URL would allow distributing assets across several filesystem backends.

migrations table has no auto increment primary key ID on some DB's this causes errors and fails migration completely, I also remember faintly some db tools dislike this and remove checkboxes (phpmyadmin?)

Yeah, we should add this. I suppose we'll want to do something like what Clark did in flarum/framework#2651. Luckily I don't think we'll ever have tens of thousands of migrations, so that should make life quite a bit easier.

The DatabaseSessionHandler requires knowledge of the actor. As user nor request isn't available from the container one cannot override the session.handler easily; right now fixed by binding the actor into the container.

I don't think we'll be able to use Laravel's default provided one, that assumes way too much Laravel stuff is present. That being said, I don't know why we wouldn't be able to make our own version. I'm also not sure why it would require the user globally: the only places where it seems to be used in core are the CollectGarbage and StartSession middlewares; in both cases, we can pass the current request in through a setRequest method.

A solution is to use the revision/hash as query param thus auto invalidating client-side cached files, but also providing a fallback in case the revision is incorrect.

This seems simple enough, and I don't see any downsides off the top of my head. In favor!

Overriding any of the compilers (because you want to override the revision generation), requires you to override the Assets class which you cannot override easily.

Not sure how we'd want to approach this one; we could pass in a container instance and resolve an abstract string binding (which could be set via service providers)? Not sure I like giving it a container instance though. Why not make the compilers singletons on the service provider level, instead of relying on Assets?

logoUrl and favIconUrl both need to start using a similar implementation like the avatar_url of User. Right now the only way to override these is by mutating the attributes in the ForumSerializer.

If we implement flarum/framework#1783, wouldn't we have the base path of the assets filesystem, regardless if it's stored with us or on S3? I do agree that consistency would be nice though.

posts.number calculated on php backend flarum/framework#3350

What's the alternative?

It would be helpful to be able to add migrations from the local extend.php. Right now my default go to is to create a custom command that replicates our own migration command, but with a different path.

Seems sensible, although I'd prefer to discourage this for non-advanced use.

cache clearing won't work across nodes

I'll need to do a bit more research on distributed cache theory before opining on this one.

luceos commented 3 years ago

I don't think we'll be able to use Laravel's default provided one, that assumes way too much Laravel stuff is present. That being said, I don't know why we wouldn't be able to make our own version. I'm also not sure why it would require the user globally: the only places where it seems to be used in core are the CollectGarbage and StartSession middlewares; in both cases, we can pass the current request in through a setRequest method.

The alternate solution would be to override the StartSession middleware to instantiate the other session handler there and inject the actor in the some run.

The problem is you cannot globally define a new session handler into the ioc because it expects in its own logic to be able to retrieve the actor on whim. As you suggested a custom solution might solve that, but I think that any distance you create between the framework of the components you use has to be deliberate. Because the hurdle of easily porting packages from Laravel will only become greater this way. Horizon is a great example of that.

Not sure how we'd want to approach this one; we could pass in a container instance and resolve an abstract string binding (which could be set via service providers)? Not sure I like giving it a container instance though. Why not make the compilers singletons on the service provider level, instead of relying on Assets?

Binding the compiler into ioc the same way the other assets related binding works would already help a ton. This allows use of extend and resolving for instance.

If we implement flarum/framework#1783, wouldn't we have the base path of the assets filesystem, regardless if it's stored with us or on S3? I do agree that consistency would be nice though.

I think consistency is an extensible system is very important. We already are all over the place in terms of some implementations.

What's the alternative?

Check the issue ;)

Seems sensible, although I'd prefer to discourage this for non-advanced use.

Maybe something for flarum-cli perhaps? flarum-cli migrate --local which would set up the command in app, update composer.json with the autoload.psr4.app key?

I'll need to do a bit more research on distributed cache theory before opining on this one.

The problem is that clearing a cache from the dashboard or cli executes the ClearingCache event only on that node. As a consequence the other nodes are not flushed. A possible solution would be to remember the last flushed timestamp in the local filesystem cache and periodically check for that while flushing the cache if the timestamp does not match against the database stored one.

Alternatively a way to trigger an endpoint on every node could be an idea too.

Both suggestions would not make sense for core though, but maybe as extension...

askvortsov1 commented 3 years ago

The problem is you cannot globally define a new session handler into the ioc because it expects in its own logic to be able to retrieve the actor on whim. As you suggested a custom solution might solve that, but I think that any distance you create between the framework of the components you use has to be deliberate. Because the hurdle of easily porting packages from Laravel will only become greater this way. Horizon is a great example of that.

I took another look at the DatabaseSessionHandler implementation, and I'm not sure why we wouldn't be able to immediately use it. There are several places where it attempts to read from a guard and a request bound into the container, but it checks that they are actually bound before proceeding:

https://github.com/laravel/framework/blob/8.x/src/Illuminate/Session/DatabaseSessionHandler.php#L224 https://github.com/laravel/framework/blob/8.x/src/Illuminate/Session/DatabaseSessionHandler.php#L199

So these seem to be completely optional. And we don't really need them because we store all this ourselves on the access token level.

I think consistency is an extensible system is very important. We already are all over the place in terms of some implementations.

My attempt to refactor this in https://github.com/flarum/core/pull/2729 is coming together extremely nicely: URLs to logo/favicon and avatar are being generated from the filesystem driver. Those now work the same way Assets does:

https://github.com/flarum/core/blob/783c563305f5d1f809e2bdc388f9e0cc785d8f4a/src/Frontend/FrontendServiceProvider.php#L29-L29 https://github.com/flarum/core/blob/783c563305f5d1f809e2bdc388f9e0cc785d8f4a/src/Frontend/Compiler/RevisionCompiler.php#L140-L140

Both suggestions would not make sense for core though, but maybe as extension...

We'd just need to make sure it's possible without having to completely replace the skeleton.

luceos commented 3 years ago

Point added:

askvortsov1 commented 3 years ago

One potential solution to caching:

The filesystem refactor I linked above unifies and simplifies our disk filesystem use for both internals and extensions. We might want to consider doing a similar thing with cache. Laravel supports many cache drivers out of the box, so we could relatively easily add laravel caching for:

Formatter already uses a Laravel-based cache, but there's also stuff in storage/formatter. However the fix @luceos has been working on should fix this, so that we don't need to worry about clearing the storage/formatter cache ourselves. We also don't have control over this cache, it's entirely in TextFormatter land.

A system like this would allow one solution for all caching concerns, make it simpler (and documented!) to add caching to new things, and support various cache drivers (file as default, database, memcached, redis, etc) out of the box.

franzliedke commented 3 years ago

As far as I can tell, sessions are still saved to disk instead of the database. That would be a big problem with scalability. Are you planning to keep this to extension territory, is this part of flarum/framework#2074 or should it be added to this list? 😀

askvortsov1 commented 3 years ago

As far as I can tell, sessions are still saved to disk instead of the database. That would be a big problem with scalability. Are you planning to keep this to extension territory, is this part of flarum/framework#2074 or should it be added to this list?

At least personally, I'd like to see direct use of Laravel service providers (and the accompanying config from https://github.com/flarum/core/blob/12f6b1b3750a0e04286a1535dab8074046413218/src/Foundation/InstalledSite.php#L159-L159) moved completely into Flarum service providers, like flarum/framework#2732 does. Sessions would eventually be a part of that. For now though, I believe it's possible to support a DB driver via custom container bindings and a migration to add the necessary table.

franzliedke commented 3 years ago

I don't know of any special strengths of the file-based driver that warrant keeping it as default, but yes, it can be exchanged at runtime. Your choice. 👍🏼

askvortsov1 commented 3 years ago

I don't know of any special strengths of the file-based driver that warrant keeping it as default, but yes, it can be exchanged at runtime. Your choice. 👍🏼

Needs more research! Just like everything, now that I think about it. Someone should really get around to inventing a time machine already :thinking:

luceos commented 3 years ago

One of the reasons this issue does not include the session driver is because it can indeed be easily replaced. This issue is mostly about the real hardships after having tried to replace or extend.

luceos commented 2 years ago

Coming back to database session handling. When using database session handling all requests will always switch to write connections. This removes the ability to set up read/write connections. For API GET requests (without the update last seen logic) in theory connections could remain read only the whole time, with database session handling this is no longer the case, because we use sessions in our API middleware.

To prevent these write connections, file based sessions are great; but they don't (horizontally) scale unless you mount the sessions directory or add session stickiness to your loadbalancer with its own risks involved.

What would make sense, I think, is to ship Flarum with both session and database session handling. And add a select in the (new) advanced pane which should not be visible by default. Then anyone interested can easily toggle between the two (and invalidate all user sessions 🙈). Alternatively it really isn't too much to have a dedicated extension become a drop-in replacement for file based session handling.

tankerkiller125 commented 2 years ago

I think the whole point of the advanced panel was to add in things for scalability, in which I think database sessions fits quite neatly.

askvortsov1 commented 2 years ago

I think the whole point of the advanced panel was to add in things for scalability, in which I think database sessions fits quite neatly.

Well not just for scalability, but for minor tweaks of various constant values / config. I'd like to avoid becoming Discourse, so we should still use use-case-designed settings UIs when possible, but there should be a place to put all the constants / minor things / assorted knobs that should be tweakable.