agronholm / sqlacodegen

Automatic model code generator for SQLAlchemy
Other
1.9k stars 245 forks source link

Improve code generator design #199

Open agronholm opened 2 years ago

agronholm commented 2 years ago

I'm pretty unhappy with the current design of the code generator system. There's duplication of logic between the model generation and rendering parts. The rendering system is not nearly flexible enough. There's a lot of awkwardness that I see everywhere.

Here's how I would like the code generation process to work:

  1. Generator receives a MetaData object
  2. Generator iterates through the tables, generating models and module variables
  3. Generator assigns names to tables, classes and class attributes, potentially aliasing existing imports
  4. Generator renders imports, module variables, tables and classes

Import collection must happen at or after phase 2, so that they could potentially be aliased to avoid conflicts with module variables and attribute names. This is preferable to renaming attributes to avoid clashes. In order for this to work, we need to factor out actual Columns and Tables out of the abstract model that is passed to the renderer. Any callables referred from the model must be abstracted so that they could be dynamically renamed if their relevant imports end up being aliased to avoid a name clash.

@leonarduschen do you see any obvious potential problems with this plan?

leonarduschen commented 2 years ago

Nope, I don't see any obvious problems with the plan, I think we can definitely benefit from a higher degree of abstraction between the components of the generated code.

A few things that came to mind:

In order for this to work, we need to factor out actual Columns and Tables out of the abstract model that is passed to the renderer. Any callables referred from the model must be abstracted so that they could be dynamically renamed if their relevant imports end up being aliased to avoid a name clash.

Beyond the problem of name clashes, abstraction of Constraints/Indexes will also help in solving:

Import collection must happen at or after phase 2, so that they could potentially be aliased to avoid conflicts with module variables and attribute names.

There could also be conflict between module variables and attribute names (e.g., if model has the column name "metadata"), so module variables will also need to be abstracted. So in the event of name clashes, I think we can prioritize like so: attribute names, module variables, imports.

agronholm commented 2 years ago

Yes, we seem to be on the same page as for what to do. I've already started writing the new code. If this works out, it will remove the last obstacle before v3.0.0 final.

agronholm commented 2 years ago

I have ~6~ 2 of 39 tests failing in the tables generator tests. No major roadblocks in sight. The new generator code already looks so much better than the old one.

agronholm commented 2 years ago

All tests are passing locally for TableGenerator. Next I will update DeclarativeGenerator.

leonarduschen commented 2 years ago

How's the refactoring going? If you're short on time and if it is at a state where you can offload some work, I'd be happy to help out

agronholm commented 2 years ago

It's progressing, but I haven't worked on it in a couple weeks. It's normal for me to juggle between my various projects. I would like to first finish my proposal before I let others work on it, so as to not get the wrong idea about my aims there. With the declarative generator, I was facing a choice: to either convert a generated table model into a class model, or to generate a class model from scratch. The latter option may require more code, but the former option required me to add the original table name to the model – something the table generator approach didn't require. So I'm still pondering my options here.

agronholm commented 2 years ago

Now that I looked at the code again, I think I figured out a way to do it with the table model conversion approach.

agronholm commented 2 years ago

I've been working on this all day today. 16 or 37 tests fail, most either due to yet missing m2m relationship support, others due to the missing name conflict fixer code.

agronholm commented 2 years ago

Back to 20 tests failing, but m2m relationship support works now. Once the declarative tests start passing, I expect the rest of the work will be substantially easier.

agronholm commented 2 years ago

11 tests failing. Most are about testing for attribute name/import conflicts.

agronholm commented 2 years ago

The declarative generator passes now. The relationship generation method is 273 lines long and I'm looking to reduce some of the duplication there, but I will focus first on getting the entire test suite to pass.

agronholm commented 2 years ago

I've pushed my changes so far to the generator-rewrite branch.

leonarduschen commented 2 years ago

Thanks for the updates!