RanvierMUD / core

Core engine code for Ranvier
https://ranviermud.com
MIT License
45 stars 41 forks source link

Allow for extendable Entity Loaders #110

Closed Sakeran closed 4 years ago

Sakeran commented 4 years ago

This implements a new, optional feature to DataSourceRegistry and EntityLoaderRegistry that allows developers to extend the behavior of the default EntityLoader class. It additionally addresses one component of #96 by adding a remove method to EntityLoader.

Current Issue

EntityLoader defines a complete CRUD API sufficient for working with core Ranvier components, but developers may wish to make use of additional functionality provided to them by their chosen method of persistence. For example, a developer using an SQL solution may wish to query for or operate on a subset of data, which is inefficient using the provided fetchAll and replace methods, particularly on large datasets.

Proposed Solution

Because we can't predict what features might be available to any one backend, we can instead allow developers to define extensions to EntityLoader. These extensions will most often be written by DataSource authors (and included alongside it) to take advantage of their loader's unique features, but can be defined independently if desired.

To enable a loader extension, we add an additional option to the dataSources field in ranvier.json:

{
  "dataSources" : {
    // Path to the DataSource class, as usual.
    "require": "./loaders/mySQLDataSource",

    // Path to the extension function.
    "extendLoader": "./loaders/mySQLLoaderExtension"
  }
}

If extendLoader is set, DataSourceRegistry will call the function with EntityLoader to create a mixin class, which will be used for all Entity Loaders using that source. Example:

// ./loaders/mySQLLoaderExtension

const loaderExtension = EntityLoader => class SQLEntityLoader extends EntityLoader {

  SQL(statement, ...params) {
    // Added functionality.
    // Executes a query statement and returns the result.
    return this.dataSource.SQL(/* ... */);
  }

  fetch(id) {
    // Extended functionality.
    // Could be used to audit/filter/transform/etc
    return super.fetch(id);
  }
}

module.exports = loaderExtension;
shawncplus commented 4 years ago

I'm not really sure I understand the point of this. So your example is that someone wrote an OldMysqlSource which someone uses, but then they have custom stuff they'd want to add onto it so they have this extendLoader thing to have extra stuff injected into OldMysqlSource? To what end?

I guess I'm confused at the combination of DataSource and EntityLoader terminology here.

The point of the EntityLoader and DataSource separation is that entity loaders don't know anything about how data gets loaded. So to have an entity loader have a method like SQL would completely defeat the point of the entire system. It also makes using the entity loader extremely fragile because you're no longer free to swap out the data source for an entity safely anymore because the DataSource you swap to may no longer be injecting this custom method. Causing any call sights to that method now be fatal errors

Sakeran commented 4 years ago

These are some valid concerns. I'll try to address them as best I can, though I might have some trouble with that since you aren't wrong.

I wonder if the EntityLoader name might be somewhat misleading at this point - particularly if what it's loading is something other than a Ranvier entity like a Room or player file. The way I'm understanding the architecture (and I may be mistaken), it's more like a DataInterface.

This proposal is based on some limitations I came up against while writing my own DataSource for NeDB, but I'll use a hypothetical SQL Source to illustrate my point.

Here's the issue - Suppose I'm writing a 'guild roster' command (defined in some bundle) that lists every player in a particular guild. I also have my player data on an SQL server somewhere. fetchAll().filter() technically works, but it's wasteful when I only need 5-10% of what that query returns. Even more wasteful if I only need the players' names. I need the ability to express a more detailed query.

It is possible to dig out the SQL DataSource itself through the EntityLoader, but then I lose most of the advantages associated with (or I imagine are associated with) EntityLoaders. I also need to know how the DataSource's API works internally, and apply configuration manually in the command definition - and in every similar command.

If direct access of DataSources in bundle code is the correct approach in these cases, then I'll concede that this precise change is unnecessary. I would still have no way to express a more complex query, however. loader.datasource.SQL causes just as much of an error if the datasource is switched out.

My example doesn't quite show the situation you describe. The idea is that the custom loader is (typically) designed for use with one specific type of DataSource - it acts as that DataSource's interface. My 'datasource-nedb' package would define both the Loader and the Source, for example. The DataSource itself is never modified. (I may have been misleading when I said they could be defined independently for existing sources - they can, but that's not a typical use case).

I did try to be careful in how a custom loader would work in practice. The "mixin" pattern ensures that the custom EntityLoader always defines at least the base fetch/update/etc functions. Requiring the game developer to set the mixin explicitly in ranvier.json ensures they've acknowledged the modified interface - and will be reminded of it if they do switch it out later.

There are probably other solutions to this. If Ranvier adopted as standard the use of query language (say, GraphQL), then the core EntityLoader could have a 'query' method that would allow for more complex interactions. This would be safer, though it then becomes incumbent on the DataSource author to resolve queries correctly (and it might still fail if we switch DataSources, though perhaps silently).

I proposed this particular solution because it seemed like the simplest to implement, and the most transparent to anyone using it. I'm certainly open to other options, though.

(PS - I didn't consider it when writing the example, but good point on the SQL Loader method. I still need the functionality, but as the Loader author I'd probably allow something closer to playerLoader.fetchByColumn(guildName), or something similar.)

shawncplus commented 4 years ago

If I'm understanding correctly the gist is you may have some "thing" that is actually the equivalent of DB relation (get all players in guild) where that data/relationship may not even be in the same place where a player is stored (in a join table, for example)

It's definitely an interesting problem. The solution at the moment would have to be handled with a normalized model and resolved as a filter/map as you said. Where Guild, Player, and PlayerGuild are each an EntityLoader and to figure out which players are in a particular guild you'd do something like:

const playerGuilds = await playerGuildLoader.fetchAll();
const result = await Promise.all(
 playerGuilds
  .filter(pg => pg.guild_id === targetGuildId)
  .map(async pg => playerLoader.fetch(pg.player_id))
);

The benefit of this is basically what I outlined before. EntityLoaders have no idea how the data is sourced/cached/persisted and they only care about one Entity at a time.

The downside, as you mentioned, is that this is cumbersome at best and at worst because the loaders don't know anything about how the data is sourced and because the data sources aren't talking to each other it may be terribly inefficient.

The DataSource/EntityLoader paradigm definitely breaks down for what I'll call "complex entities" and I'm not sure if there's a great solution for it at the moment other than to say for some complex entities one might just need something purpose-built rather than use that system.

The idea of a query method is intriguing, especially if taking into account GraphQL.

Sakeran commented 4 years ago

Fair enough - I'll close this PR and resubmit with just the added remove method on EntityLoader.

Just to wrap up, it sounds like keeping the EntityLoader API consistent for everyone is an important concern. That makes sense. We also seem to be on the same page regarding the current limitations of the system. Here's my understanding:

Points 2 and 3 were how I justified the ability to extend EntityLoader's methods. While EntityLoader might not know anything about where the data comes from, the developer using it kind of has to. For instance, the developer who wrote the loadAreas method in BundleManager "knows" that the areas DataSource defines fetchAll, even though he can't say for absolute certain that it will be. He can apprise the project developers of the DataSource requirement, however, and expect that they will accept the necessary constraint if they want the code to work.

For this reason, extending EntityLoader doesn't introduce a new class of problem, unless the possible loss of the prettier Error message is unacceptable (which I would buy). We could ignore the cases in core and say that the EntityLoader API should be homogeneous because bundle code might be shared across projects with different configurations. This is somewhat hard to defend, since DataSources can still not implement certain methods, causing some projects to break where others wouldn't.

Bottom line - EntityLoaders suggest a consistent API, but don't guarantee one. Developers need to know something about the DataSource to make informed decisions about what the EntityLoader will do.


Point 4 is interesting, and might actually suggest a better solution than mine. Forget my previous Players/Guild example and imagine a simpler query: "get all players under level 10". We're still stuck with fetchAll if we just use EntityLoader's conventions. However, if we accept that the developer will know something about the DataSource, I could technically set up my NeDB DataSource to do this:

const newbies = await playerLoader.fetch({ level: { $lt: 10 } });

It's ugly, and obviously not the intended use of EntityLoaders, but if I document the DataSource well enough, developers will know that this usage returns a list of players. (It also doesn't touch core, so you can't stop me. :p)

I wouldn't use that approach, but it gets closer to what I think I really wanted, which was something like this:

const newbies = await playerLoader.fetchAll({ level: { $lt: 10 } });

const bobsWife = await playerLoader.fetch({ spouse: "Bob" }); // assume monogamy

// Bob died :(
await playerLoader.update({spouse: "Bob"}, {spouse: null});

If we trust the developer enough to allow EntityLoader methods to accept arbitrary parameters, then we afford them a lot more expressive power while still keeping the "pretty error" behavior. It's also a lot less work on DataSource authors than my previous solution, since we don't have to write two pieces to the puzzle. Semantic issues could be addressed with additional methods like insert, updateAll, etc.

These changes only really apply to single-table queries, admittedly. Complex, join-y operations are still an issue, but I suspect that won't be 100% solvable in core no matter what we try. It is likely solvable at the DataSource level, though, since those are able see multiple EntityLoaders and coordinate them, provided the developer can sufficiently express what they want.

shawncplus commented 4 years ago

EntityLoaders act as interfaces for DataSources

Kinda. I'd phrase it as "EntityLoaders are the way to access entities. Pretend DataSources don't exist." Neither an EntityLoader or someone using an EntityLoader should need to know or care about implementation details of a DataSource. That was the original goal of the architecture. Essentially if you're using a loader and your code starts caring about files something has gone terribly wrong. The architecture is intended to guarantee a consistent architecture, JS is not great at that so can allow people to shoot themselves in the foot.

the developer who wrote the loadAreas method in BundleManager "knows" that the areas DataSource defines fetchAll

No. They only know that the areas EntityLoader defines fetchAll. It doesn't know anything about the datasource. If I were to swap out the DataSource for areas to one that happened to not have fetchAll the code would fail for sure but that's mainly due to JS not having true support for Interfaces and not by design. The DataSourceRegistry could essentially do this work by verifying that any DataSource you try to load has certain methods.

I'm fully on board with the query stuff. I'd personally keep the signatures of fetch() and fetchAll() as they are and introduce a new query() method but that's personal preference

Sakeran commented 4 years ago

From the DataSource example in the docs, we have this comment in the hasData docstring:

   * This is the only required method of a DataSource, all others are optional
   * but not implementing them will obviously limit its funtionality.

I suppose this is where my confusion stems from - it's implies that the other DataSource methods aren't required to be defined. Verifying that they exist at startup would most likely be a good idea, and would ensure that the project developer doesn't need to concern themselves with whether or not any EntityLoader will throw errors on a given method.

It does put the DataSource firmly in the role of implementation detail though, which is mildly disappointing on my end - I had a more feature-oriented picture in my head. As such, I won't pretend I understand the intention of this code well enough to pursue any further modifications.

(For the record, I'd take query if that really is the only acceptable option. Introducing an entirely new language into core is just a little above my paygrade.)

Thanks for the feedback.