Javacord / Javacord

An easy to use multithreaded library for creating Discord bots in Java.
https://discord.gg/javacord
Apache License 2.0
762 stars 178 forks source link

Concept: Cache module #805

Open Bastian opened 3 years ago

Bastian commented 3 years ago

I've already suggested a design for an external cache module in https://github.com/Javacord/Javacord/issues/454#issuecomment-832957623. However, since this suggestion was not well received because of the immutability, this issue proposes a mutable external cache that also allows sharing data between multiple shards and allows for different data stores like Redis.

Similar to the immutable cache, the mutable cache holds simple immutable and serializable POJOs that do not contain references to other entities except by their ids.

Since the module is independent of Javacord, we can use another JVM language like Kotlin to reduce the necessary boilerplate.

Storing data objects

interface DataEntity {
    val indices: Array<Index>
}

data class Index(
    val name: String,
    val value: String
)
@Serializable
data class ServerData(
    val id: Long,
    val name: String,
    // More fields ...
): DataEntity {
    override val indices: Array<Index>
        get() = arrayOf(Index("ID", id.toString()))
}
@Serializable
data class ServerTextChannelData(
    val id: Long,
    val name: String,
    val serverId: Long,
    // More fields ...
): DataEntity {
    override val indices: Array<Index>
        get() = arrayOf(
            Index("ID", id.toString()),
            Index("SERVER", serverId.toString()),
        )
}
interface DataStore {
    fun insert(value: DataEntity)
    fun get(clazz: Class<out DataEntity>, indexName: String, indexValue: String): List<DataEntity>
    fun delete(clazz: Class<out DataEntity>, indexName: String, indexValue: String)
}

The implementation of the data store is extremely simple and can look like this for a classic in-memory store:

class InMemoryDataStore: DataStore {

    val entities = HashMap<String, MutableList<DataEntity>>()
    val reverseIndex = HashMap<DataEntity, Array<String>>()

    override fun insert(value: DataEntity) {
        val indexKeys = value.indices.map {
            value::class.simpleName + "-" + it.name + "-" + it.value
        }.toTypedArray()
        indexKeys.forEach {
            val list = entities[it] ?: mutableListOf()
            list.add(value)
        }
        reverseIndex[value] = indexKeys
    }

    override fun get(clazz: Class<out DataEntity>, indexName: String, indexValue: String): List<DataEntity> {
        return entities[clazz.simpleName + "-" + indexName + "-" + indexValue] ?: listOf()
    }

    override fun delete(clazz: Class<out DataEntity>, indexName: String, indexValue: String) {
        val results = get(clazz, indexName, indexValue)
        results.forEach { entity ->
            val indexKeys = reverseIndex[entity]
            reverseIndex.remove(entity)
            indexKeys?.forEach {
                val list = entities[it] ?: mutableListOf()
                list.remove(entity)
                if (list.isEmpty()) {
                    entities.remove(it)
                }
            }
        }

    }
}

Since all entities can be serialized (e.g. to JSON), it's easy to create other data stores like Redis. A mixture of different data stores for different data classes (e.g. only users/members in Redis) is also possible and easy to implement.

Using the data objects

We don't want to expose the data objects themselves but keep our current API. The current entities can be constructed as needed:

public class Server {

    private final DataStore dataStore;
    private ServerData data;

    public Server(DataStore dataStore, ServerData data) {
        this.dataStore = dataStore;
        this.data = data;
    }

    public String getName() {
        refreshData(data.getId());
        return data.getName();
    }

    public List<ServerTextChannel> getTextChannels() {
        return dataStore.get(ServerTextChannelData.class, "SERVER_ID", String.valueOf(data.getId()))
            .stream()
            .map(dataEntity -> new ServerTextChannel(dataStore, (ServerTextChannelData) dataEntity))
            .collect(Collectors.toList());
    }

    private void refreshData(long id) {
        List<DataEntity> entities = dataStore.get(
            ServerData.class, "ID", String.valueOf(id)
        );
        if (!entities.isEmpty()) {
            data = (ServerData) entities.get(0);
        }
    }

}
public class ServerTextChannel {

    private final DataStore dataStore;
    private ServerTextChannelData data;
    private Server server;

    public ServerTextChannel(DataStore dataStore, ServerTextChannelData data) {
        this.dataStore = dataStore;
        this.data = data;
        this.server = new Server(dataStore, getServerData());
    }

    public String getName() {
        refreshData(data.getId());
        return data.getName();
    }

    public Server getServer() {
        return this.server;
    }

    private ServerData getServerData() {
        List<DataEntity> entities = dataStore.get(
            ServerData.class, "ID", String.valueOf(data.getServerId())
        );
        if (entities.isEmpty()) {
            return null;
        }
        return (ServerData) entities.get(0);
    }

    private void refreshData(long id) {
        List<DataEntity> entities = dataStore.get(
            ServerTextChannelData.class, "ID", String.valueOf(id)
        );
        if (!entities.isEmpty()) {
            data = (ServerTextChannelData) entities.get(0);
        }
    }

}

The two important differences to the previously suggested immutable cache are, that

  1. Children keep references to their parents. This is necessary because elements might no longer be in the data store when methods like #getServer() are called. The other direction is no problem. For example, the deletion of a server would only result in an empty list when #getChannels() is called.
  2. Before returning data for method calls like #getName(), we refresh the cache. This makes sure that we always return the latest data. For data that cannot change (e.g. the id), a refresh is not necessary.

Updating the cache

Updating the cache is very straightforward:

Since entities do not store references to other objects, it should be possible to simply map Discord's gateway entities to the data objects and replace them in our cache with very little effort. E.g. for a channel creation or deletion, we can simply add/remove the channel from the cache without having to update any references in the server object (like we currently have to).

Considerations

Bastian commented 3 years ago

From https://github.com/discord/discord-api-docs/issues/2184#issuecomment-816353472:

[...] It is not feasible to serve the entire member list of every guild to bots, and as guild sizes continue to grow it is unlikely we will support this operation over gateway forever. If you haven't already, I would recommend taking steps to move to persisting and caching data. Having stale data is generally considered OK, and you can update data as you need to instead.

This makes this issue even more important. We should expect, that one day, Discord will remove or at least severely limit the options to request all members of a server. When this happens, we should provide a way for users to have a (semi-)persistent cache for users.

felldo commented 3 years ago

However, since this suggestion was not well received because of the immutability, this issue proposes a mutable external cache that also allows sharing data between multiple shards and allows for different data stores like Redis.

I think an immutable cache is still possible. If I am not wrong with this cache design it should be possible to have a mutable or immutable cache without the need to have a more complex code base:

private void refreshData(long id) {
       if(api.isImmutableCache){
           return;
       }

        List<DataEntity> entities = dataStore.get(
            ServerTextChannelData.class, "ID", String.valueOf(id)
        );
        if (!entities.isEmpty()) {
            data = (ServerTextChannelData) entities.get(0);
        }
    }

With a simple flag that can be set we can decide whether we want to actually refresh the data from the cache or not. This results in either a mutable or immutable enity. Having this immutable behaviour should also be very performant as we do not refresh the data from the cache everytime a method is called and at the same time still allows us to share the cache among other bot instances.

Bastian commented 3 years ago

This results in either a mutable or immutable enity.

This will not work for collections (e.g., Server#getChannels()) though. These are always fetched from the cache and the cache itself is mutable.

felldo commented 3 years ago

Then this will result in a sort of semi immutable. But once you get the channels from the cache they are immutable entities which you can store in a variable to process them in your task. I think that's a good trade off though.