1Computer1 commented 5 years ago

Read about it here if the issue format isn't suitable: https://gist.github.com/1Computer1/0f7b650549dc4974718a269f9d51f98e

What's wrong with the design of data stores?

Data stores as they are currently are hard to understand and fragile to maintain.
We cite three reasons for this: they do not inherit in a sound way from their superclasses, their design is not extensible, and they have too much responsibilty; managing both the cache and the API.

Poor inheritance

Data stores are not the same as collections. We can say that there is an is-relationship for some of the stores, but to say that there is one for all of the stores is incorrect. For example, look at GuildMemberRoleStore and GuildEmojiRoleStore which do not actually store any data themselves but instead takes data from the guild's roles and delegates a _filtered property to the Collection methods. Not to mention, they don't even extend DataStore itself! Suppose one day we decided to change DataStore in some way that affect the interface of these stores, but did not update the actual code of these pseudo-stores; it'll be a bad time.

Furthermore about Collection, consider a data store, say MessageStore. Suppose we filtered using the inherited filter method from Collection. Should this return a new MessageStore or a new Collection? The default behavior, since MessageStore inherits from Collection which inherits from JavaScript's Map, is to give back MessageStore. But currently, we have overriden this behavior to give back a Collection, since after all, what happens if someone were to call MessageStore#fetch on the filtered store? The cache is quite possibly invalid and could give back bad results. But we have just broken inheritance and that's no good!

The design is not extensible

How about custom caches? Whether or not you want these, it is still important that something like it is both possible and easy to do. As of right now, this is impossible, or at the very least, very difficult and possibly hacky. This is because the cache is way too highly coupled with the rest of the code, making it difficult to inject a custom cache in. This will be one of the major point of the proposal, to separate the cache and the API.

Speaking of cache, its hard to do things without cache. Especially with partials being more commonly used, it would be very nice to be able to just switch off some caching to save memory and call the API directly, but still have a high-level API. We have some things like this, such as GuildMemberStore#ban, but having it be all available will be extremely useful.

Too much responsibilty

Which methods work on the cache and which call the API? Which one does both? Without referencing the documentation, let me know:

delete
remove
removeAll
sweep
get
add
set

Do you think its unfair? Well, so do the people who have to use or maintain the library for the first time! Here's something funny too, in GuildMemberRoleStore and GuildEmojiRoleStore, the set method from Collection is completely overwritten with a completely different method that calls the API. Fun!

This name conflict is something that has to be resovled through separation of cache and API as mentioned before. If we add more cache-less methods, it'll become very important that this is done.

To drive the point, what is the difference between a collection and a data store? Should CategoryChannel#children be a store? What about GuildChannel#members? These things are both like GuildMember#roles and GuildEmoji#roles in that they're do not actually cache their data and are instead wrappers around existing data. So what's stopping them from becoming a store? This points towards data stores having way too much responsibility that it manages to sneak into things they're not meant to.

The proposal to remove data stores

This proposal is quite simple: separate the cache and the API.

To do that, we introduce the idea of a manager class. A mananger class will hold all of the API methods for a certain structure as well as the cache for that structure, if applicable. They will replace the various data stores in the library. So for example, channel.messages will be a MessageManager, then channel.messages.cache will be a Collection of Message. This will allow an interface that still reads nicely:

channel.messages.fetch
channel.messages.delete (we can do this instead of a silly synonym like remove)
channel.messages.cache.get
channel.messages.cache.has
etc.

Internal code will interact with the cache directly. Methods originally on data stores like resolve, add, etc. can continue to exist on the manager class. To be specific, the criteria to as to what goes on what would be:

All Collection methods goes to cache, nothing more, nothing less.
Methods that interact with the API goes on the manager.
Methods that know more about the cache goes on the manager. For example, resolve knows the type of the cache's values, add knows how to format API data to our structures, etc.

For data stores which are not actually data stores, the proposed solution is to make them actually cache. To make it so that data isn't unnecessarily duplicated, we can have a backing field. For example, GuildMemberRoleManager would have the _roles property normally on GuildMember. Then, the cache would be a getter to give a readonly Collection that uses this backing field. Doing so will allow much more consistency in the codebase and will future-proof us for API changes. This can extend to other miscellaneous collections-but-could-be-data-stores like Channel#children or GuildChannel#members.

Special note: ChannelStore is an LRU cache. We can simply add a new LRUCollection class. This is an example of a "custom cache" that this proposal enables.

Benefits

There will not be poor inheritance in this proposal, because there won't be inheritance at all! This proposal explicitly is about composition over inheritance.

This proposal is extensible. In the future, we can imagine custom caching via dependency injection. Not only is the cache injectable, but so are the managers themselves! This means we can swap out cache, but also (separately) the API methods, allowing for fine-grain control of how things are done. The control of cache is very important to allow people to bring in new tech or just lower their memory usage. Plus, as mentioned, more methods that do not rely on cache means partials will play better.

And of course, the separation of cache and API makes it easier to reason and maintain the code. The managers have less responsibility to worry about and won't get in a mess with the cache. Not only that, we can add new methods on top without worrying as much about running out of reasonable synonyms for entirely different things.

Drawbacks

An extra property to chain with cache. This is a breaking change on a larger magnitude than data stores themselves, since before no one had to add cache on top to use has or get.

There will be confusion for when to use cache and when to not. We can introduce this as cache methods vs API methods, which should be understandable for most. If every collection of structures, so things like CategoryChannel#children became managers, it should be more obvious going forward: always use cache when you want access the collection.

There can still be some ambiguous names inside the manager classes, such as add or create or similar. However, it'll be clear that these methods interact with the API and not the cache, so there's still more separation than before.

Other solutions

Revert to pre-data stores

This is not feasible. While it does solve problems #1 and #2, the namespace is mixed together. Not only that, the things modifying the cache and cache itself are too far away. Plus, the user API is not as fluent, which was one of the reasons we switched to data stores in the first place.

For those unaware, pre-data stores looked like channel.fetchMessage instead of channel.messages.fetch and guild.ban instead of guild.members.ban.

Separate, but put the cache elsewhere

This will solve #3, but now the design is more tightly coupled. The cache does not rely on the manager at all, but the manager relies on the cache. This naturally leads to the cache being a member of the manager. If we put it elsewhere, there will be more circular references. Even worse, if we specifically make the manager a member of the cache, not only will we get the previous problem, we'll run into #1 again too.

Try to fix data stores?

There might be a way to fix data stores as it is. For "pseudo-caches" like GuildMemberRoleStore, we could turn them into proper data stores as proposed. While this solves the first half of #1, everything else is still a problem. The only way to fix everything else is to separate.

We talked a little bit in the Discord server already but putting it here would be better for discussion.

ShadowRanger commented 5 years ago

Based on the reactions to this issue, I can see that there is a lot of support for it. No one is yet to respond to it so I will be the first. I will express my opinion based on each of the headings you used.

Poor inheritance The structure of inheritance in a case like this should be consistent. As you pointed out, there are areas where the inheritance has been broken. I mean yeah, it works, but not int he most friendly way.

The design is not extensible There would definitely be situations where developers would want to implement their own caching system. Having a solid default cache system will probably be enough for most, but the ability to customize and extend it would be a great change.

Too much responsibility I can definitely agree that someone that does not know the exact workings of the DataStore methods or who hasn't read the documentation would most likely not know whether they interact with the cache, or with the API itself.

Separating the code that works with the cache and the code that works with the API would be a good change. This can go back to the consistency issue as well. Some DataStores overriding methods that they all share is probably not ideal. Separating them would ensure consistency, and probably also avoid some confusion.

The proposal to remove data stores Your proposed solution is solid, clear and consistent. It quite clearly separates the API methods from the cache methods, in a way that is easy to understand and work with in implementations of the classes. I personally can't think of any ways to improve on your solution right now, but I'm sure others could contribute some helpful ideas towards it.

Benefits All your points here are valid and of which I can agree with. Overall it ensures consistency in a clear way that separates API work from caching, and is more easily extensible and maintainable.

Drawbacks Although the drawback with the extra cache property might be a bit of a pain, I feel as though the benefits outweigh the drawbacks.

Other solutions To be honest, I don't really think there are any other better solutions than the main one you proposed. This is where input from others would be welcomed. The more input and ideas from others the better.

JMTK commented 5 years ago

I believe this was shutdown this time last year. I commented on it as well but I think the idea of being able to abstract your caching mechanism would be great(dedicated redis instance).

https://github.com/discordjs/discord.js/issues/1409

monbrey commented 5 years ago

I believe this was shutdown this time last year. I commented on it as well but I think the idea of being able to abstract your caching mechanism would be great(dedicated redis instance).

1409

This isn't really addressing external caching as such, it's focused on abstracting the cache and the API methods into separate internal structures.

Having said that, with that abstraction in place it may then be easier to extend the library and implement and external caching mechanism for anyone who wants to take on the challenge of doing so.

appellation commented 5 years ago

External caching would still be limited by the concerns expressed in #1409: Discord.js heavily relies on fast synchronous cache access and an external cache would mean that all internal cache access would become async. This would make almost every method in Discord.js become async (even if not using an external cache), severely hampering usability and potentially causing significant performance degradation.

The custom caches that are mentioned here would not be for completely changing the storage medium; rather, they would allow users to have complete control over what and how data is stored on the JS heap (LRU, different map/hashing strategy, etc.).

Skillz4Killz commented 5 years ago

Would this affect the release date of v12 or is this change a post v12 change?

almostSouji commented 5 years ago

Where would be the point in releasing v12 with stores if we think that it shouldn't be handled like this? This would be an alternative approach to stores, so should definitely be included in v12 instead of stores

Skillz4Killz commented 5 years ago

@almostSouji Just to be clear i really do love the idea of this change. I am not opposed to the change, just opposed to doing it before v12.

My concern is about how D.JS is not keeping a healthy cycle of updates for stable D.JS users. Almost 2+ years now v12 has not been released because new things keep being added in for v12. D.JS master has a lot of improvements that are never being merged into stable because v12 isn't released. This change should be done post v12 release in my opinion.

I really don't think it is good to delay v12 which is right around the corner, with only 2 things left on the to-do list. https://github.com/discordjs/discord.js/projects/4

The current solution whether flawed or not is working and heavily tested on a lot of bots already with little to no issues. I believe before breaking v12 release it should be heavily considered by the dev team to release v12 into stable and then make this change. V13 may or may not be here for another 2 years if it follows a similar release as v12. The current solution is good to last for the lifetime of v12 in my opinion. Especially since most of the benefits of this change are improving the maintenance of D.JS and not features/fixes to the framework itself.

iCrawl commented 5 years ago

Our not so "healthy" update-cycle solely is based on only 2 people having access to release versions to npm, and both are not active on this project anymore. Whenever they are, it's for a very very brief amount of time.

Anyway, the reason we do not want to put this in a later version is because the moment we release v12 we will have to maintain it, probably or maybe again for 2+ years. We already have so many things we regret and still have to maintain in v11, why would we repeat that?

It is definitely not a question of "it works, so why change it" in this case.

bdistin commented 5 years ago

You are always going to have something you regret, but have to maintain. Otherwise, major versions wouldn't exist.

didinele commented 5 years ago

V12 hasn't been released because we aren't happy with certain things, so as Crawl pointed out, why release something poorly designed (as Compu pointed out), not because we keep on pushing changes

iCrawl commented 5 years ago

You are always going to have something you regret, but have to maintain. Otherwise, major versions wouldn't exist.

I didn't say this won't be the case, but the majority of devs that actually still work on this library are in favor of it, so 💁

SpaceEEC commented 4 years ago

This has been implement with #3696.

discordjs / discord.js

Separate cache and API (aka remove data stores) #3485