Cog-Creators / Red-DiscordBot

A multi-function Discord bot
https://docs.discord.red
GNU General Public License v3.0
4.83k stars 2.31k forks source link

Arke: New storage system for Red, relational databases and ORM! #5779

Open laggron42 opened 2 years ago

laggron42 commented 2 years ago

Hey, I'm writing this issue after a conversation in #advanced-coding where I initially suggested the following change.

Config is great, allows anyone to store stuff easily within Red, but it has a lot of limitations too. After playing a lot with django recently, I kept thinking about integrating an ORM in Red, for cog developers, as an additional tool, which I will refer to as "Arke"

What will Arke be?

Arke will be like Config : providing tools to cog creators for storing data without worrying about creating files or setting up a database. However, the stored data can be relational, which means no more dicts and hierarchy, only models and relations!

The models will be defined using an ORM, and Red will handle the rest (registering the models, preventing conflicts, handling the database based on the user's choice).

What will Arke not be?

It is NOT a replacement for Config, it is an optional addition. Anyone should be free of using Config or Arke.

Relational models? What is that?

Unlike Config, the storing schema will not be dictionaries, but relational models. You have to describe what a model should look like, and what it should be linked to.

Dictionaries are limited to two relations: parent and children, while relational models have no such limit, and do not need to worry about hierarchy. Take the following example :

class ModLog:
    id: int  # this will be the primary key
    # basic fields
    reason: str
    time: datetime.datetime
    type: Literal["warn", "mute", "kick", "ban"]

    # relational fields
    guild: discord.Guild
    member: discord.User
    moderator: discord.User

With a model like that, you can easily query and filter all modlogs for a guild, a member, or a moderator. From that modlog object, you can access the other linked attributes back.

>>> all_modlogs = ModLog.objects.all()
>>> guild_modlogs = ModLog.objects.filter(guild=red)
>>> all_reasons = ModLog.objects.column(reason)

>>> my_modlogs = ModLog.objects.filter(member__id=348415857728159745)
>>> first_modlog = my_modlogs[0]
>>> first_modlog.moderator
<discord.Member(Kowlin)>

(This is absolutely not valid Python code, just showing examples)

All of the fields described above are columns, which means you can for example query all reason objects of modlogs, without querying the other stuff. That kind of operations make everything not only more optimised for both time and storage space, it also makes programming a lot more convenient.

What is an ORM?

ORM means Object relational mapping.

Databases like that need SQL operations, and it's annoying to deal with, but thanks to ORMs, you can simply build an object in a language, and it will be translated to SQL queries.

There are multiple ORMs for Python, the two most popular ones are SQLAlchemy and Django ORM.

I personally find the latter easier to use, but there's no standalone version, plus SQLAlchemy has some great features (support for more dbs, async operations, advanced SQL support...), so I guess we'll use that.

Let me show some examples from their docs:

Creating models

class User(Base):
    __tablename__ = 'user_account'

    id = Column(Integer, primary_key=True)
    name = Column(String(30))
    fullname = Column(String)

    addresses = relationship("Address", back_populates="user")

    def __repr__(self):
       return f"User(id={self.id!r}, name={self.name!r}, fullname={self.fullname!r})"

class Address(Base):
    __tablename__ = 'address'

    id = Column(Integer, primary_key=True)
    email_address = Column(String, nullable=False)
    user_id = Column(Integer, ForeignKey('user_account.id'))

    user = relationship("User", back_populates="addresses")

    def __repr__(self):
        return f"Address(id={self.id!r}, email_address={self.email_address!r})"

Manipulating data

# Taking the first user
>>> session.scalars(select(User)).first()
User(id=1, name='spongebob', fullname='Spongebob Squarepants')

# Taking only some keys from the first user
>>> session.execute(select(User.name, User.fullname)).first()
('spongebob', 'Spongebob Squarepants')

# Picking a specific object
>>> session.execute(select(User).where(User.name == 'spongebob'))
User(id=1, name='spongebob', fullname='Spongebob Squarepants')

# Progressively fetch users ordered by ID
>>> result = session.execute(select(User).order_by(User.id))
>>> result.fetchone()
(User(id=1, name='spongebob', fullname='Spongebob Squarepants'),)
>>> result.scalars().all()
[User(id=2, name='sandy', fullname='Sandy Cheeks'),
 User(id=3, name='patrick', fullname='Patrick Star'),
 User(id=4, name='squidward', fullname='Squidward Tentacles'),
 User(id=5, name='ehkrabs', fullname='Eugene H. Krabs')]

Of course this needs more explanations, this is just a quick overview of what you can do. I will write detailed and comprehensive docs for this module.

What will this look like in Red?

Just showing some draft ideas, nothing is done yet, I'm giving stuff I have in mind and I'm very open to suggestions.

Developer side

I need to play more with SQLAlchemy before being sure of how this is going to work, but ideally, I want something that will be very similar to config :

User side

There are multiple ways of doing this, and I want it to be as seamless as possible. Still unsure on what to do so please comment on this 👀

Here's something I had in mind:

That, or simply setup Arke on all instances by default, but for something optional, not sure if it's a great thing.

A quick FAQ

Why "Arke"

Had to come up with a name other than BetterConfig :kappa:

Searched for a nice name, and found Arke from the Greek mythology. During the Titanomachy, she was a messenger from the Titans, sending messages to the Olympian gods. You can read more about her here.

Since the goal of the application is to provide an interface to cog developers, and then handle saving the data on our side, I think that's a cool and fitting name :D

Why should this be part of core Red?

Cogs have already been using databases other than Config like leveler, without any tool like that from Red. However, learning how to configure a database, connect to it and send queries is hard. Even harder for the users who have a hard time setting up that database themselves (see Fixator's channel 👀).

On the other hand, Config is great: support for different drivers, easy interface for all cog devs, and easy to use for the end user. I want to bring that to relational databases too.

If this module is independent from Red, it's not going to change a thing: each cog will have its own database system, without any coordination. In addition, new devs will simply use the existing system, and not bother adding a whole layer to use a specific database.

I never wanted myself to learn SQL, JSON is way easier to use. However, the way django works with its ORM made everything so much easier, made me love relational database schemas, and now I'm having a hard time using JSON. I think if the tool is there, ready to use, more developers will want to try it.

Wouldn't that make the setup process harder for end users?

Sure, setting up a database can be hard: installing additional programs, setting up hosts and passwords, making sure the connection stays alive... except for SQLite!

Config has complex driver options which requires that kind of setup too, but there's always one easy and straightforward option: JSON. Well in our case, that easy and straightforward option is SQLite3, it's just a text file, doesn't require any server! Sure you will be losing performance, but for small sized bots, that will be more than enough. Large bots have already switched to Redis or other drivers than JSON.

What are the other advantages of using a db

Omitting the ORM, tables and relational stuff here.


I think I explained everything I had in mind. As you can see, it's just ideas in my head, but they're pretty clear. I will start working on some prototypes very soon, and want to hear your feedback!


Update (less than 30 min after yes): Of course we're not limited to SQLAlchemy as an ORM! Just talked to Preda which mentioned piccolo ORM, and that looks a lot better! This issue is for sorting the main points, including this one

laggron42 commented 2 years ago

Hey, first real update, I looked at multiple Python ORMs and wanted to present a list of potential candidates so we could discuss about it, but there are very few solutions that fit our requirements. Most notably, the following ORMs won't fit for Arke:

That leaves us to Tortoise ORM: it's async, its models are comprehensive and intuitive (django style), it's active, frequently updated, has support for SQLite, postgresql, MySQL, MariaDB, and most importantly, it's fast, like really fast compared to other ORMs:

See benchmarks here.

Examples

So I think I'll go for that! I'm really open to discussion on this subject (or everything about arke), for now I will mostly play with the tools, see what's possible and think about the design before doing actual work in the repo.

Therefore, time to throw more examples, but this time using Tortoise specifically.

Simple model: adventure player This is a small example of a model without relations. ### Writing the model ```py import enums from tortoise import fields from tortoise.models import Model class PlayerClass(Enum): wizard = 0 bard = 1 ranger = 2 class Player(Model): name = fields.CharField() health = fields.IntegerField() level = fields.IntegerField(default=1) class = fields.CharEnumField(null=True) # can be empty ``` ### Creating players ```py >>> p1 = Player(name="Laggron", health=100) >>> await p.save() # or directly >>> await Player.create(name="aika", health=200, class=PlayerClass.bard) ``` ### Accessing players ```py >>> await Player.all() [, ] >>> await Player.get(name="aika") >>> await Player.filter(health__lte=150) # players with health <= 150 [] ``` ### Reading or setting attributes ```py >>> aika = await Player.get(name="aika") >>> aika.health 200 >>> aika.health -= 40 >>> await aika.save() ```
Relations: ForeignKey Let's see the most basic relation, the foreign key (or OneToMany). Take for example a ticket system. There are tickets and members, members create tickets. ### Writing models ```py from tortoise import fields from tortoise.models import Model class Member(Model): discord_id = fields.IntField() guild_id = fields.IntField() class Ticket(Model): time = fields.DateTimeField(auto_now_add=True) # will be automatically set to utcnow() on creation message = fields.CharField() member = fields.ForeignKey(Member, on_delete=fields.CASCADE) ``` Each ticket must be linked to **one** member, a member can have **many** tickets (OneToMany). `on_delete=fields.CASCADE` means that when the member object is destroyed, all of his tickets are also destroyed. ### Creating the objects ```py >>> m = await Member.create(discord_id=260, guild_id=42) >>> await Ticket.create(message="plz add martine", member=m) ``` ### Accessing objects ```py >>> m = await Member.get(discord_id=260) >>> t = await m.tickets.first() >>> t.member == m # reverse access True >>> all = await Ticket.filter(member__discord_id=260) # all tickets from one member [] ```
Relations: ManyToMany Another type of relation is ManyToMany: An object A can be linked to multiple objects B, and objects B can also be linked to multiple objects A. A side note: Config has built-in relations for the most common models, Arke could have the same and provide pre-built models? (Would also make GDPR easier.) Let's suppose that for the following example. ```py from tortoise import fields from tortoise.models import Model from redbot.core.arke.models import Member, TextChannel class Event(Model): channel = fields.ForeignKey(TextChannel, on_delete=models.CASCADE) admins = fields.ManyToMany(Member) # an event can have many admins, an admin can have many events description = fields.CharField() start_time = fields.DateTimeField() ``` Then, when creating objects: ```py >>> event = await Event.create(channel=ctx.channel, description="Fall Guys party", start_time=some_datetime_object) # notice how admins is omitted, it's initially empty >>> await event.add(ctx.author, guild.get_member(some_id)) # adding two members as admins >>> async for admin in event.admins(): ... print(admin.name) Kowlin JennJenn ```
Relations: OneToOne Last relation, you probably guessed it, OneToOne relations link two unique objects together. ```py from tortoise import fields from tortoise.models import Model from redbot.core.arke.models import Role class AutoRole(Model): role = fields.OneToOneField(Role, on_delete=models.CASCADE) command_name = fields.CharField(null=True) button_id = fields.IntField(null=True) reaction_id = fields.IntField(null=True) ``` That way, a role cannot be linked to multiple auto roles.

Will write a lot of docs, with better examples to make it easy for beginners!

Integration with Red

Initializing Tortoise is quite easy, and most importantly doesn't require an instance of something:

from tortoise import Tortoise

await Tortoise.init(
    db_url='sqlite://db.sqlite3',
    modules={'models': ['app.models']}
)
await Tortoise.generate_schemas()

This means there is no instance to pass around like Config. There is still stuff to do on our end, which will likely need additional lines on cog start to register the models and generate the schemas.

Something I'm not sure of is how to handle different cogs. Of course you can't put everything in the same database, there will be conflicts everywhere, so basically there are two solutions:


So, yeah, think ideas are starting to get more precise, will start working on stuff, show it in #testing and ask for feedback, don't hesitate to leave some here too!

madebylydia commented 2 years ago

I had this exact same idea myself, but I've been told that it wouldn't be a great idea to integrate it right into Red itself (I cannot hand out the conversation nor remember much of what I was told), but rather as a dependency for cogs. This is my way of thinking too, I see Config as enough/necessary for Red & devs. Introducing such a framework (And I'd make the difference between a library and a framework) into Red directly, is not what I think is a great idea.

Ok, now let me tell you I love the idea (Most especially since I had myself), I'd like to make it a reality too for devs, but I think you've been more into the developer-side problem-solving rather than the user-side impact that may come with it, so I'll try to explain a bit more below. Feel free to contact me on Discord (Capt. Pred#0495) after reading my comment if you'd like, I'd love to have a conversation with you if you have anything to say that doesn't require a full comment here. :)

Why Config and not Arke into Red?

I see Config as part of Red because Config is a direct relation with the raw data and the cog, which is, well, extremely needed for creating cogs in Red today, anyhow, I think I don't need to debate on why Config is needed in Red's core.

Now, I think that Arke shouldn't be part of Red for some reasons, and to make it easier to read, I'll make a list:

In top of all of these, I personally think it doesn't fit Red's core itself, but I don't really have words to describe it, sadly.

How I approached the idea myself

To refer to "my ORM framework", I'll give it the name of "RORM" (As a short of RedORM, a temporary name I had)

When doing research on how to integrate an ORM-like experience for developers, here's what I ended up with:

In the end, I've found out that Pydantic might be what I was looking for (Using ORM mode), since there's no apparent other library like Pydantic, AND that it seems it would support Config properly, in top of providing a great type hinting + clean code for the end developer.

As of today, I still have made no code regarding RORM, and I am not planning to code it anymore, so I'm sharing the knowledge I was able to gather so far, so I could help you too.

I hope my comment will help you to have more guidance and understand my thought on your idea, hoping you'll find it instructive. I'd love to hear other's thoughts too here, as I've mainly had thought from 2 or 3 core developers.

Personal inspiration

I mostly had that idea myself when I was working with AdonisJS with it's ORM, it might be interesting to see how that one works by itself, even if that's a whole different language.

laggron42 commented 2 years ago

Hey, thanks for your comment! While I understand your view, I think there are solutions to make Arke fit well into Red. Let me respond to your points.

You're forcing users to use Arke, because if one cog they want to install does require Arke, they have to install Arke.

Cog creators are free to enforce whatever they want for they cog. Install Leveler from fixator, you need MongoDB. Install notsobot from Trusty, you need to setup imagemagick. I'm not sure why a requirement is a problem for cogs? Especially when the setup will be built into Red, the only thing they'll have to do is another pip install.

Users don't really care about how you ship your product, they care that it works right away (That's a general sentence, not just with Red, and I'm not criticizing the work you're trying to add), so having another installation step for users might not be the right idea.

I agree, but once again, that installation step is just a pip install. Maybe in the future, if arke is widely used, its requirements will be installed by default. Also, I think users will enjoy the speed performance, and being able to install complex cogs without having to setup a database themselves is also valuable.

Users may not want to migrate to a relational DB for many cases.

Not sure what you mean here. What migrations? The developer should handle them, and more precisely using the tool provided by arke for generating migrations (like django does). Migration from config to arke is also on the dev.

If you meant end users having to use a relational DB, also not sure why it's a problem. Just like everything else, they won't even notice what happens under the hood and have a program that works out of the box.

If users decide to go for PSQL/MySQL/MariaDB, they need to know how to manage it, and I'm not quite sure everyone knows how to do that (I still hate it myself...).

If users decide to go for postgresql for Config, they need to know how to manage it. How is that a change? SQLite's config on the other hand is seamless, just like bare JSON files.


About your idea, well the first con is that it's based on Config. Basically, you're just bringing oriented object programming to config, without the benefits of relations or the speed increase.

Arke is not just about bringing an ORM, it's about setting up the environment around it to make it easy to use for both developers and users. If I wanted to use an ORM just for my cog, I would have just used Tortoise. On the other hand, bringing it to Red with an easy interface encourages devs to look into it. Not only will they learn interesting stuff, they will also write more efficient cogs. It's also easier, more intuitive to use and has better readability. If it wasn't for Django, I would never have used an ORM, and I want to bring the same chance to Red devs, especially since a database is not a light requirement in a standalone cog.

Flame442 commented 2 years ago

I am -1 on this proposal as it is currently written for the following reasons:

That, or simply setup Arke on all instances by default, but for something optional, not sure if it's a great thing.

If this is something added to core, then it should be treated identical to Config in the setup process. It should have one question asking what backend to use, with the recommended default being SQLite. If the SQLite version of this is as seamless for users to use as JSON is for config, then there is no reason to hide this behind a flag and complicate usage in cogs.

npc203 commented 2 years ago

It's a +1 from me, and definitely needed in core.

Addressing @Flame442 concern from my perspective of what I understand Arke will likely look like:

Config as it's meant, solves the problem of basic setting of key values and things necessary to restore the state of the bot. Changing a single value in a huge list, causes to rewrite the whole list again in the case of config, which didn't sit well with me. IO bottlenecks111.

Arke solves the need of incremental storage, when data of similar types are grouped and edits happens frequently. Arke can't survive alone, config is still needed for simple key value pair settings. both can glue, overcoming each other's deficiencies.

Ofc, similar to config, the learning curve is a little steep for those who have no idea about ORMs, but again the goodies later are too lucrative imo. Totally optional to be used by CCs, same as how initally when config was made, people were still using dataIO and manual storage into json files in cog directory.

ltzmax commented 2 years ago

I'm going to say +1 myself because it does indeed sounds intersting and would definitely use this myself.

But i do have to agree with one part from flame here.