Column names with single letters separated by underscores are concatenated

junbl commented 2 years ago

Description

A table with column names that include single letters separated by underscores are not queried correctly. All underscores dividing only single letters are removed.

For example, a column with the name a_b_c_d will be queried as abcd, while a column with the name a_b_cc_d will be queried as ab_cc_d.

Steps to Reproduce

Create a table with column names in the specified format.
Attempt to query the table.
Observe your errors!

Expected Behavior

The fields will be queried with their correct names.

Actual Behavior

The names are changed and so the query fails.

Reproduces How Often

Every time.

Versions

Seen on Windows 11 and Ubuntu 20.04 (through WSL) with a MySQL database (5.7 and 8.0).

Dependency graph:

├── sea-orm v0.6.0
│   ├── sea-orm-macros v0.6.0 (proc-macro)
│   ├── sea-query v0.21.0
│   │   ├── sea-query-derive v0.2.0 (proc-macro)
│   ├── sea-strum v0.23.0
│   │   └── sea-strum_macros v0.23.0 (proc-macro)

Additional Information

Repository showing the issue

You can work around this issue by including a #[sea_orm(column_name = "correct_name")] on the field, but this is obviously less than ideal as it's a lot of boilerplate and opportunity for errors that are difficult to track down if you mess up the name.

junbl commented 2 years ago

Okay I figured out the issue. The relevant derive macros only deal with the Column enum, which has PascalCase names. So a_b_c_d becomes ABCD, which then becomes abcd when heck's to_snake_case is called. (And a_b_cc_d -> ABCcD -> ab_cc_d, etc.).

This wouldn't be fixed with a reimplementation of to_snake_case, since there's no way to distinguish a_b_1 from a_b1---both become AB1. Or ab_c1_d from ab_c1d, or ab_c1_d1 from ab_c1d1, or whatever else. (I have all of these in my garbage database lol 😭).

The best way to fix this, as far I can see, would be to remove the DeriveColumn in the DeriveEntityModel macro (sea-orm-macros/src/derives/entity_model.rs:323), and replace it with an implementation that uses the struct fields (entity_model.rs:83). Seems like this might obviate DeriveColumn altogether.

Alternatively, sea-orm-codegen could insert the #[sea_orm(column_name = "same_name_as_struct")] when it detects the issue (field != field.to_pascal_case().to_snake_case()). This is worse UX, but easier, and I doubt that many people have such terrible names for their columns like me. This approach could also be extended to columns that aren't valid Rust identifiers to start with, like those that start with numbers. A case could be added to the existing check to see if a column shares a name with a keyword and appends r# or _ if so.

Another idea would be to have the #[sea_orm(column_name)] tag automatically applied with the name of the field unless overwritten by the user. Kind of backwards but might be easier than the first option while presenting the same interface to the user.

Not sure when or if I'd have time to submit a PR, but I'm interested which if any of these approaches appeals to y'all.

tyt2y3 commented 2 years ago

Alternatively, sea-orm-codegen could insert the #[sea_orm(column_name = "same_name_as_struct")] when it detects the issue (field != field.to_pascal_case().to_snake_case()

Thank you for the investigation and explanation. I think the above is definitely doable

junbl commented 2 years ago

It looks like that wouldn't totally fix it---I'm looking at the FromStr implementation and it has the same issue. It doesn't look at the column_name attribute so it does the same pascal -> snake thing. So parse still wouldn't work in this case. I was working around this issue using .to_string(), which uses the column_name, but then this gives the wrong field in the normal case when I wanted to rename a column in the database.

If you don't want to do the Column impl in the DeriveEntityModel derive using the fields directly, maybe a set of attributes could be added to the DeriveColumn macro to handle this case? Like the fields from the struct could be passed as #[sea_orm(column_names_snake_case = fields] and it could fall back on the current implementation if the fields didn't appear in the list.

Alternatively, the column enum variants could have unmodified snake_case names instead of PascalCase-ing them at all.

billy1624 commented 2 years ago

Hey @e-rhodes, sorry for the delay. Please check #695

SeaQL / sea-orm