ZeroKnight / ZeroBot

My personal IRC/Multi-protocol Bot created (and re-created) for education and amusement.
MIT License
1 stars 0 forks source link

Database Interface #10

Closed ZeroKnight closed 3 years ago

ZeroKnight commented 4 years ago

More or less a continuation/refactor of the ideas originally started in the Perl incarnation.

Task List

Design

ZeroBot shall make use of a monolithic zerobot.sqlite database with some core-related tables (?), global tables usable by all modules, and module-specific tables. I'm currently not sure what the core tables would hold, and may just merge this concept into the global tables.

Each module (and the Core) will have its own connection to the database. A ZeroBot.database method will be responsible for initializing these connections, and the core will deliver these connection objects to modules that request them.

Modules that make use of the database should adhere to a table naming convention such that modules with a single or primary table have a name equal to their module identifier, e.g. chat. Modules may create additional tables, but the names should always prefix these table names with their identifier, e.g. chat_triggers.

Implementation Details

ZeroBot.database.Connection shall extend sqlite3.Connection in order to associate the module that the connection is for, and facilitate synchronization of any state with Core.

Connections are requested by modules via a method like Core.database_connect, typically within their module_register callback. This method shall call ZeroBot.database.create_connection, which handles creating and setting up an sqlite3.Connection object. A dictionary within Core will map module identifiers to these resultant connection objects. Modules may close their connection manually, but the Core will ensure that this is done during shutdown and module reload.

Once a module is connected, it may execute any arbitrary queries it wants. Connection creation and closure should be logged, probably in INFO.

Database transactions should be logged, with the executing module included in the message. DQL statements should go to DEBUG, and DML/DDL should go to INFO. Can achieve this through light extension of sqlite3.Connection.execute* methods. TCL statements should be logged as well, but probably only commit and rollback warrant being sent to INFO; the rest can be sent to DEBUG.

Table Access [Deprecated]

NOTE: I've since decided against this. The enforcement offered is rather superficial, as a module could simply reset the authorizer to a dummy callback or make its own connection directly. With this in mind, writing the authorizer and the config logic around granting modules access becomes a bit of a waste of effort, as modules can realistically do whatever they want anyway; as the saying goes: "we're all consenting adults here". ZeroBot's modules are intended to be able to interact anyway. Modules can of course still use set_authorizer to implement validation, if they wish to do so.

I'm leaving these notes here anyway for reference.

Table access can be controlled via sqlite3.Connection.set_authorizer. A callback will be defined in ZeroBot.database that authorizes transactions in the following ways:

ZeroKnight commented 4 years ago

Table Restructuring, Normalizing, and Reformatting

ZeroBot's database was created at a time that I knew nothing about database design and was just going by the seat of my pants. Data needs to be properly normalized and reformatted in many tables; especially quote.

General

New Users Table, Aliases Table

The users table will hold user information for other tables and features to make use of in a consistent manner. Each entry should represent a single, unique user and store any useful information about them. Currently not sure what protocol-agnostic information to add for now, though. Maybe something like "last seen time", but would need to extend the Context API for this.

UPDATE: Came up with a few columns

The aliases table is an associative table that stores alternate names for a given user in the users table. This is useful if someone changes their username (often or otherwise), or in cases where someone has a modified name for other purposes, like foobar_work. The table shall consist of only a compound PK of the form (user_id, alias), where user_id is a FK to the users table and alias is an alias for that user. A boolean case_sensitive column determines whether the alias is case-sensitive or not; defaults to false.

This table also includes the same columns as the users table.

Counter Table

Quote Table

Preliminary Cleanup

Some light cleanup and consistency fixing to make it easier to restructure things programmatically later.

Quote Storage Format

Standard, single-author quotes will remain unchanged from the existing formatting: only the quote body is stored and the author is implicitly added when fetching a quote.

Multi-line quotes shall be stored more or less as-is in the body: newlines are preserved but collapsed and authors are inserted either literally or with ordinal placeholders (\1, \2, etc.). A new switch, --multi will be added to specifically designate a multi-author quote:

Restructuring, Fixes, and Additions

Further normalization of Quote data

Rename quote_authors to quote_lines or similar and add a string column (line) for a quote body; this is either the whole quote for a single-author quote or just the one line in a multi-author quote. A line_num column will be added to create a new compound key of (quote_id, line_num). This would consequently result in removing quote.body, moving this data into line.

quote_rd

Quote Authors & Submitters

NOTE: This idea succeeds a previous, unwritten one where I tried to use a similarly named quote_authors associative table and make all authors use a user_id; this obviously ran into glaring problems with "nonce" authors and was scrapped in favor of this.

A new quote_participants table will contain unique author/submitter names indexed by a unique participant_id and will contain an optional column pointing to a user_id if the participant represents the same person as an entry in the users table. The quote table will then relate to this table by pointing to a participant_id instead of keeping a name in the submitter column directly.

This approach should adequately solve the issue of storing authors that are also users vs. "nonce" authors, e.g. an author that is a website or outside of a chat network; it is not fully normalized, but it is a good enough compromise. Authors that are retroactively made into users only require that quote_participants.user_id be updated to point to the new user; this can be facilitated with a user_created module event that the Quote module can catch. Likewise, for aliases, an alias_created event necessitates that any existing quote author names matching the new alias require quote_participants.user_id to be updated to point to the user ID of the new alias. Of course, nonce authors simply always have a user_id of NULL; this allows us to identify actual users and nonce authors from the perspective of the quote_participants table and by extension check author aliases.

In addition, a quote_authors associative table indexed by quote_id and participant_id will be added. This table allows for associating multiple authors to one quote, and are designated an ordinal number in the author_num column. For example, if a quote with quote_id 123 contained a line by "Bob" followed by a line from "Alice", then there would be a tuple of {123, <Bob's id>, 1} and a tuple of {123, <Alice's id>, 2} added to the quote_authors table, where the last element is author_num.