Open laggron42 opened 2 years ago
Hey, first real update, I looked at multiple Python ORMs and wanted to present a list of potential candidates so we could discuss about it, but there are very few solutions that fit our requirements. Most notably, the following ORMs won't fit for Arke:
That leaves us to Tortoise ORM: it's async, its models are comprehensive and intuitive (django style), it's active, frequently updated, has support for SQLite, postgresql, MySQL, MariaDB, and most importantly, it's fast, like really fast compared to other ORMs:
See benchmarks here.
So I think I'll go for that! I'm really open to discussion on this subject (or everything about arke), for now I will mostly play with the tools, see what's possible and think about the design before doing actual work in the repo.
Therefore, time to throw more examples, but this time using Tortoise specifically.
Will write a lot of docs, with better examples to make it easy for beginners!
Initializing Tortoise is quite easy, and most importantly doesn't require an instance of something:
from tortoise import Tortoise
await Tortoise.init(
db_url='sqlite://db.sqlite3',
modules={'models': ['app.models']}
)
await Tortoise.generate_schemas()
This means there is no instance to pass around like Config. There is still stuff to do on our end, which will likely need additional lines on cog start to register the models and generate the schemas.
Something I'm not sure of is how to handle different cogs. Of course you can't put everything in the same database, there will be conflicts everywhere, so basically there are two solutions:
Adventure__weapons
). Allows relations between cogs' models if specified, and unique Discord models for everythingSo, yeah, think ideas are starting to get more precise, will start working on stuff, show it in #testing and ask for feedback, don't hesitate to leave some here too!
I had this exact same idea myself, but I've been told that it wouldn't be a great idea to integrate it right into Red itself (I cannot hand out the conversation nor remember much of what I was told), but rather as a dependency for cogs. This is my way of thinking too, I see Config
as enough/necessary for Red & devs. Introducing such a framework (And I'd make the difference between a library and a framework) into Red directly, is not what I think is a great idea.
Ok, now let me tell you I love the idea (Most especially since I had myself), I'd like to make it a reality too for devs, but I think you've been more into the developer-side problem-solving rather than the user-side impact that may come with it, so I'll try to explain a bit more below. Feel free to contact me on Discord (Capt. Pred#0495) after reading my comment if you'd like, I'd love to have a conversation with you if you have anything to say that doesn't require a full comment here. :)
I see Config as part of Red because Config is a direct relation with the raw data and the cog, which is, well, extremely needed for creating cogs in Red today, anyhow, I think I don't need to debate on why Config is needed in Red's core.
Now, I think that Arke shouldn't be part of Red for some reasons, and to make it easier to read, I'll make a list:
In top of all of these, I personally think it doesn't fit Red's core itself, but I don't really have words to describe it, sadly.
To refer to "my ORM framework", I'll give it the name of "RORM" (As a short of RedORM, a temporary name I had)
When doing research on how to integrate an ORM-like experience for developers, here's what I ended up with:
In the end, I've found out that Pydantic might be what I was looking for (Using ORM mode), since there's no apparent other library like Pydantic, AND that it seems it would support Config properly, in top of providing a great type hinting + clean code for the end developer.
As of today, I still have made no code regarding RORM, and I am not planning to code it anymore, so I'm sharing the knowledge I was able to gather so far, so I could help you too.
I hope my comment will help you to have more guidance and understand my thought on your idea, hoping you'll find it instructive. I'd love to hear other's thoughts too here, as I've mainly had thought from 2 or 3 core developers.
I mostly had that idea myself when I was working with AdonisJS with it's ORM, it might be interesting to see how that one works by itself, even if that's a whole different language.
Hey, thanks for your comment! While I understand your view, I think there are solutions to make Arke fit well into Red. Let me respond to your points.
You're forcing users to use Arke, because if one cog they want to install does require Arke, they have to install Arke.
Cog creators are free to enforce whatever they want for they cog. Install Leveler from fixator, you need MongoDB. Install notsobot from Trusty, you need to setup imagemagick. I'm not sure why a requirement is a problem for cogs? Especially when the setup will be built into Red, the only thing they'll have to do is another pip install.
Users don't really care about how you ship your product, they care that it works right away (That's a general sentence, not just with Red, and I'm not criticizing the work you're trying to add), so having another installation step for users might not be the right idea.
I agree, but once again, that installation step is just a pip install. Maybe in the future, if arke is widely used, its requirements will be installed by default. Also, I think users will enjoy the speed performance, and being able to install complex cogs without having to setup a database themselves is also valuable.
Users may not want to migrate to a relational DB for many cases.
Not sure what you mean here. What migrations? The developer should handle them, and more precisely using the tool provided by arke for generating migrations (like django does). Migration from config to arke is also on the dev.
If you meant end users having to use a relational DB, also not sure why it's a problem. Just like everything else, they won't even notice what happens under the hood and have a program that works out of the box.
If users decide to go for PSQL/MySQL/MariaDB, they need to know how to manage it, and I'm not quite sure everyone knows how to do that (I still hate it myself...).
If users decide to go for postgresql for Config, they need to know how to manage it. How is that a change? SQLite's config on the other hand is seamless, just like bare JSON files.
About your idea, well the first con is that it's based on Config. Basically, you're just bringing oriented object programming to config, without the benefits of relations or the speed increase.
Arke is not just about bringing an ORM, it's about setting up the environment around it to make it easy to use for both developers and users. If I wanted to use an ORM just for my cog, I would have just used Tortoise. On the other hand, bringing it to Red with an easy interface encourages devs to look into it. Not only will they learn interesting stuff, they will also write more efficient cogs. It's also easier, more intuitive to use and has better readability. If it wasn't for Django, I would never have used an ORM, and I want to bring the same chance to Red devs, especially since a database is not a light requirement in a standalone cog.
I am -1 on this proposal as it is currently written for the following reasons:
The name "Arke", while nice and pretty, does not do a good job of communicating what this system actually does. When a user is told to use Red's "Config", it is pretty clear what sort of stuff they are going to see. If a user is told to use Red's "Arke", they will have no idea what that means without additional explanation. I understand the Greek meaning, however most people do not have the background to understand the connection at first glance. I agree BetterConfig is not a great name, but there is probably a name for it which still adequately describes what it does (RelationalConfig?)
I don't see any reason why putting this in core is helpful. Like Pred said, some kind of basic data storage is necessary for Red as it needs to be able to store data in an easy and safe way for core reasons as well as for Cog Creators without experience working with JSON files safely. However, once a Cog Creator has gotten to the point of needing a relational db for performance or structural reasons, I don't believe they need to be hand held by a core system anymore. I believe that using a system like config which bundles together lots of DB backends in a not entirely efficient way is not solving the actual problem that needs to be solved for those cases where performance or structure is truly important to the function of the cog. Currently, if a Cog Creator believes that Config is too restrictive or too underperforming for their use case, they are able to create their own custom data storage system (even choosing to use a relational db like SQLite or Postgres) and use it to its full potential. While Config meets the needs of the very low impact and simple cases it is designed to handle, I don't think this system is able to meet the needs of the kinds of cases it is designed for.
It is another thing that needs to be maintained in Core. Going back to what I said earlier, if a Cog Creator is using a relational db, they are likely doing advanced things which require they understand how a relational db works and how to use it. They should be knowledgeable enough to know how to run and manage the db they are using. With that said, I don't think the benefit of making it easier to use outweighs the effort it will take in Core to create, maintain, and support this feature. Config is important because pretty much any cog needs some kind of persistent storage, but only a small percentage of cogs choose to use a db system outside of Config. I do not think something like this is necessary or worth having in Core.
However, learning how to configure a database, connect to it and send queries is hard. Even harder for the users who have a hard time setting up that database themselves
That being said, I am only one person. If the other Org Members believe this is a good idea, it can still be added. My opinion is also able to be changed if these points are addressed in a way that satisfies my concerns. I did not feel my concerns were satisfied in the conversation in #advanced-coding, however I may have failed to properly articulate them.
That, or simply setup Arke on all instances by default, but for something optional, not sure if it's a great thing.
If this is something added to core, then it should be treated identical to Config in the setup process. It should have one question asking what
It's a +1 from me, and definitely needed in core.
Addressing @Flame442 concern from my perspective of what I understand Arke will likely look like:
Config as it's meant, solves the problem of basic setting of key values and things necessary to restore the state of the bot. Changing a single value in a huge list, causes to rewrite the whole list again in the case of config, which didn't sit well with me. IO bottlenecks111.
Arke solves the need of incremental storage, when data of similar types are grouped and edits happens frequently. Arke can't survive alone, config is still needed for simple key value pair settings. both can glue, overcoming each other's deficiencies.
Ofc, similar to config, the learning curve is a little steep for those who have no idea about ORMs, but again the goodies later are too lucrative imo. Totally optional to be used by CCs, same as how initally when config was made, people were still using dataIO and manual storage into json files in cog directory.
I'm going to say +1 myself because it does indeed sounds intersting and would definitely use this myself.
But i do have to agree with one part from flame here.
Arke
does indeed sounds pretty but it would require an additional explanation of what arke
really means when we tell them to use arke
and they going "i'm so confused, what does that mean?". it's going to confuse the user/cog developer of what it means. We're all used to the word Config
. A different name that wouldn't require alot additional explanation would've been better to this. I see flame already added one example word RelationalConfig
, it doesn't sounds bad for this but would still need some additional explanation because user/cog developer may ask "what is different between RelationalConfig and Config?".
Hey, I'm writing this issue after a conversation in #advanced-coding where I initially suggested the following change.
Config is great, allows anyone to store stuff easily within Red, but it has a lot of limitations too. After playing a lot with django recently, I kept thinking about integrating an ORM in Red, for cog developers, as an additional tool, which I will refer to as "Arke"
What will Arke be?
Arke will be like Config : providing tools to cog creators for storing data without worrying about creating files or setting up a database. However, the stored data can be relational, which means no more dicts and hierarchy, only models and relations!
The models will be defined using an ORM, and Red will handle the rest (registering the models, preventing conflicts, handling the database based on the user's choice).
What will Arke not be?
It is NOT a replacement for Config, it is an optional addition. Anyone should be free of using Config or Arke.
Relational models? What is that?
Unlike Config, the storing schema will not be dictionaries, but relational models. You have to describe what a model should look like, and what it should be linked to.
Dictionaries are limited to two relations: parent and children, while relational models have no such limit, and do not need to worry about hierarchy. Take the following example :
With a model like that, you can easily query and filter all modlogs for a guild, a member, or a moderator. From that modlog object, you can access the other linked attributes back.
(This is absolutely not valid Python code, just showing examples)
All of the fields described above are columns, which means you can for example query all
reason
objects of modlogs, without querying the other stuff. That kind of operations make everything not only more optimised for both time and storage space, it also makes programming a lot more convenient.What is an ORM?
ORM means Object relational mapping.
Databases like that need SQL operations, and it's annoying to deal with, but thanks to ORMs, you can simply build an object in a language, and it will be translated to SQL queries.
There are multiple ORMs for Python, the two most popular ones are SQLAlchemy and Django ORM.
I personally find the latter easier to use, but there's no standalone version, plus SQLAlchemy has some great features (support for more dbs, async operations, advanced SQL support...), so I guess we'll use that.
Let me show some examples from their docs:
Creating models
Manipulating data
Of course this needs more explanations, this is just a quick overview of what you can do. I will write detailed and comprehensive docs for this module.
What will this look like in Red?
Just showing some draft ideas, nothing is done yet, I'm giving stuff I have in mind and I'm very open to suggestions.
Developer side
I need to play more with SQLAlchemy before being sure of how this is going to work, but ideally, I want something that will be very similar to config :
User side
There are multiple ways of doing this, and I want it to be as seamless as possible. Still unsure on what to do so please comment on this 👀
Here's something I had in mind:
--with-arke
flag, which will add another step to the process: asking what db to use for Arke-compatible cogs. The default option should be SQLite3.--setup-arke
?That, or simply setup Arke on all instances by default, but for something optional, not sure if it's a great thing.
A quick FAQ
Why "Arke"
Had to come up with a name other than BetterConfig :kappa:
Searched for a nice name, and found Arke from the Greek mythology. During the Titanomachy, she was a messenger from the Titans, sending messages to the Olympian gods. You can read more about her here.
Since the goal of the application is to provide an interface to cog developers, and then handle saving the data on our side, I think that's a cool and fitting name :D
Why should this be part of core Red?
Cogs have already been using databases other than Config like leveler, without any tool like that from Red. However, learning how to configure a database, connect to it and send queries is hard. Even harder for the users who have a hard time setting up that database themselves (see Fixator's channel 👀).
On the other hand, Config is great: support for different drivers, easy interface for all cog devs, and easy to use for the end user. I want to bring that to relational databases too.
If this module is independent from Red, it's not going to change a thing: each cog will have its own database system, without any coordination. In addition, new devs will simply use the existing system, and not bother adding a whole layer to use a specific database.
I never wanted myself to learn SQL, JSON is way easier to use. However, the way django works with its ORM made everything so much easier, made me love relational database schemas, and now I'm having a hard time using JSON. I think if the tool is there, ready to use, more developers will want to try it.
Wouldn't that make the setup process harder for end users?
Sure, setting up a database can be hard: installing additional programs, setting up hosts and passwords, making sure the connection stays alive... except for SQLite!
Config has complex driver options which requires that kind of setup too, but there's always one easy and straightforward option: JSON. Well in our case, that easy and straightforward option is SQLite3, it's just a text file, doesn't require any server! Sure you will be losing performance, but for small sized bots, that will be more than enough. Large bots have already switched to Redis or other drivers than JSON.
What are the other advantages of using a db
Omitting the ORM, tables and relational stuff here.
I think I explained everything I had in mind. As you can see, it's just ideas in my head, but they're pretty clear. I will start working on some prototypes very soon, and want to hear your feedback!
Update (less than 30 min after yes): Of course we're not limited to SQLAlchemy as an ORM! Just talked to Preda which mentioned piccolo ORM, and that looks a lot better! This issue is for sorting the main points, including this one