SeaQL / sea-orm

🐚 An async & dynamic ORM for Rust
https://www.sea-ql.org/SeaORM/
Apache License 2.0
7.3k stars 513 forks source link

Schema validation #629

Closed LouisGariepy closed 2 years ago

LouisGariepy commented 2 years ago

Schema validation

This issue is meant as an umbrella to catch discussions about a potential schema validate feature.

I am not a SeaQL maintainer, and I don't yet know my way around the codebase, so PLEASE, if you want me to correct or add information, ping me so I can keep this thread up-to-date. This will go a long way to help potential contributors :)

Background

There's been a relatively high demand for this kind of tool, which is evidenced by a lot of users asking if sqlx' compile-time checks are also available to the SeaQL family. There's also been previous attempts to implement this feature, which were ultimately rejected because they required a live database connection at compile time.

It seems like the project already contains some hints as to which lints should be implemented. The docs also shed some light on what should be expected in terms of correctness and robustness from SeaQL . Finally, this feature is part of the project's 2022 GSoC event, and there's already been interest in tackling this issue.

I think it's worthwhile to keep this issue as a logical place to discuss and debate the scope and implementation of this feature.

Specification

These are the hard requirements for this feature. As you can see, its still pretty vague, but this is what I could gather from the public discord and Github discussions.

Scope

I see two potential scopes for this feature.

  1. Validating sea-orm entities (this includes checking the table exists and that its columns match with the equivalent rust type).
  2. Validating sea-query queries.

IMO, the first scope is way easier to achieve

Implementation

Parsing a running database's schema is painless with sea-schema, so the difficult part of the implementation revolves around gathering info on SeaQL entities and queries. To me, there seems to be two ways to go about this:

  1. When the user's project is built, SeaQL annotations (#[sea_orm(...)]) generates metadata and saves it in a file in the project root. Then, the validator program can use that metadata to lint against the databases' schema. Since the project must be built to be used, this will ensure the metadata is always up to date with whatever was built the latest.
  2. Use a Rust parser on the project files to build whatever information we need from the Rust AST.

While the simplest, implementation 1 must be discarded because it severely limits the scope of the feature. Lints would basically only work with entities using a #[derive(...)].

Implementation 2 doesn't suffer from these limitations, we're only limited by our ability to parse useful info from the Rust AST. This is relatively simple for simple Entity structs, but I'm not too sure how it'd work for queries. We'd have to maintain a parser and to make sure breaking changes in SeaQL's syntax are also recognized inside the linter. This is really the crux of the problem, and I encourage more experienced users to chime in on how building a SeaQL AST from a Rust AST might look like. Is there previous art for this in the ecosystem? I guess there's clippy itself...

Billy proposed to use Rust-Analyzer as a parser for our needs see (https://github.com/rust-analyzer/rust-analyzer/issues/620), which I think might be a good call. But I wonder if RA's parsing capabilities can be used as a standalone library. I don't think it'd be a good idea to require the RA binary directly, because some environments (thinking of CI!) might not have it, since it is not distributed directly with rustup's toolchains.

Open questions

Proposal

In short, what I'm envisioning is

sea-orm-cli validate -- [ <LintSeverity> <LintName>]

which would

  1. Gather project SeaQL metadata using implementation 2.
  2. Gather database sea-schema.
  3. Analyze metadata and compare with sea-schema, using the provided [ <LintSeverity> <LintName>] as rules.
  4. Pretty-print lints and emit an error code based on the results.

Conclusion

Again, feel free to ping me if there's something that needs updating. Thanks to everyone working on SeaQL.

tyt2y3 commented 2 years ago

Thank you for the great write-up! I think there are two problems:

1) validate the Entity and database schema is compatible with each other (detect schema drift) 2) validate that the queries being generated are correct (linting) I think this is the project described in the GSoC proposal

I think 1) could be a command in sea-orm-cli: e.g. sea schema validate

I also don't understand the part where Rust Analyser might be involved.