ash-project / ash_mysql

MIT License
2 stars 0 forks source link

Problem with indices of `:string` fields #25

Open lejoko opened 1 month ago

lejoko commented 1 month ago

Hi @zachdaniel !

I post that here because I see a mysql specific behaviors that I'm not sure how to manage in ash_mysql regarding strings. I'd be happy to discuss it with you.

MySQL has a limited size for indices (768 bytes in MySQL 8.0). When a field is bigger than that size, any index on that field must be created with an explicit size. Ash fields of type :string are currently (as in ash_sqlite) translated in migrations to the :text Ecto type which is then translated to the TEXT MySQL type. That is always bigger than MySQL's max index size. It means that all indices on those fields currently need to have their migrations manually changed to add an explicit index size . I could make ash fields of type :string be translated to the :string Ecto type which is itself translated to the VARCHAR(255) MySQL type. That one is smaller than MySQL's max index size, and then it wouldn't require manual editing of migrations in most cases (it could still be a required for multi field indices). But in that case, I'm not sure how people could create fields of type :text:. In the ideal case, I would probably leave it as it is now add an automatic calculation of indices size when required, but that would be rather complex and I'm not ready to undertake such a task at this stage.

(The problem can be painful at times, because MySQL doesn't have transactional schema changes, so when a migration crashes because of a missing index size, you're left with a partially applied migration, which prevents both applying other migrations and rolling back the one that crashed... You have to either clean the mess manually or reset the whole db...)

zachdaniel commented 1 month ago

Pretty interesting. I think there will likely be some kind of explicit solution required here. For example, we can add a DSL option:

identity_index_lengths ....

and then we can require that if you have a :string type attribute, that you configure this. Additionally, we have a migration_types configuration that could potentially be used here. So like unless you set a migration type indicating a specific size, we require that you set a corresponding identity_index_length. IIRC you can do things like:

migration_types field: {:varchar, 255}

PRs welcome on this front! I won't have time to implement this myself in the foreseeable future unfortunately.

lejoko commented 1 month ago

Hmm... not sure I understand your idea with identity_index_lengths. Let me start by restating things another way, because I feel that I may have been a bit fuzzy in my previous message:

  1. max index size is fixed by MySQL, no matter how many fields are referenced by the index.
  2. index size for a field of a type smaller than the max is determined automatically and doesn't have to be specified; same thing for a multi field index as long as the total size of the fields is less than the index max.
  3. MySQL users tend to use VARCHAR rather than TEXT for most strings (probably a cultural thing dating back from the time when text fields had problems that varchar did not have) and therefore they usually do not have to specify an index size. It would be surprising to have to do it every time.
  4. TEXT fields always require an index size.
  5. multi field indices that do require explicit size, need to have a size specified for each field that is part of the index. (so you can chose which field has more/less significant chars in the index)

As a result of all of that, in the current state, all migrations generated with an index on a string field need to be manually changed.

As a first step, I'd like to change the driver in a way that would remove the need to change migrations manually in most cases, even if the change is not perfect. For that part I see several options:

  1. changing the default migration type for string fields with ash_mysql to be :string type instead of :text
  2. changing the migration generator to specify a fixed size of 768 (MySQL's max index size) for all indices on string fields

I have a bias to option 1, for the simple reason that I know how to implement it and have currently no idea how to implement 2. If you think that 2 would be a better idea, I'd be happy to have pointers to where to look to be able to implement it.

At a later time I/we may implement a cleaner solution that might auto-calculate fields size depending on the number of index fields and/or use a new dsl option

Thoughts ?

zachdaniel commented 1 month ago

šŸ¤” The point about identity_index_lengths wasn't necessarily that the name or idea was right, just that we can solve some of these kinds of problems (i.e they've asked for an identity/index that requires an index size) with additional configuration in the DSL.

What I think we don't want to do is have ash_mysql have a signfiicantly different default handling of strings than other data layers, in such a way that can be relatively easily missed by users.

I think changing :string to map to {:varchar, some_limit} would be very surprising for folks using ash_mysql coming from other data layers. I'd rather introduce a compile time error requiring that an explicit migration type be set for all strings that are used in an identity or index.

This gives us an opportunity to explain to the user why :text can be problematic in those cases. Then, if they set migration_types field: :text, we can require further that they specify a field size, in options like index_field_sizes [index_name: [field_name: 42]], and identity_field_sizes: [identity_name: [field_name: 42]].

On reflection, maybe there is an incorrect assumption on my end here: can you set a column to type VARCHAR without a size to avoid this issue? Seems strange if that works, but if it does then sure, we can use :string šŸ˜†

lejoko commented 1 month ago

Ok. I understand that you want ash_mysql to stay as consistent as possible with other data layer. However it still needs to take MySQL idiosyncrasies into account (and MySQL has lots of them... šŸ˜†). I wanted to understand why MySQL people usually prefer VARCHAR over TEXT so I dived a bit into MySQL docs.

From what I understand, contrary to Postgres and Sqlite, VARCHAR and TEXT don't work the same in MySQL. MySQL's TEXT fields are internally BLOB fields. They are not store like normal fields but in a separate space and are instantiated separately in memory. Because of that they have the following restrictions (that VARCHARS don't have):

Apparently, it is not only an historic thing (as I initially thought) but the recommended way is to use VARCHAR rather than TEXT when possible in MySQL.

On the other hand, mysql fields are restricted to a maximum of 65535 bytes for any single non BLOB/TEXT field, which means 16383 chars for an utf-8 VARCHAR field, and the maximum size of all fields of a table is also 65535, except for BLOB/TEXT fields.

When seeing that, I tend to think that ash_mysql should either default to :string/VARCHAR for Ash strings rather than :text/TEXT even if it is not the case in other data layers, or it should force ALL uses of Ash strings in ash_mysql resources to also define a migration_types for them (and not only those referenced by an index). Defaulting to :text/TEXT with no warning, could be both very surprising for MySQL users, and a source of unexpected problems.

Thoughts ?

zachdaniel commented 1 month ago

You've convinced me :) Lets swap the default.

lejoko commented 1 month ago

Wow, that was a fast answer... šŸ˜†

Of course that does not remove the index length problem. it only avoids it in most standard use cases.

zachdaniel commented 1 month ago

Yeah, and it makes them opt-in to the weirdness you've described. They'll have to accept that the maximum length generated for string attributes is 65,535, and if they want more they have to use migration_types field: :text, which based on your excellent points will be a much better experience than the reverse. šŸ‘

zachdaniel commented 1 month ago

Thanks for the due diligence and persistence šŸ™‡ !

lejoko commented 1 month ago

Thanks for Ash once more! The day I read its description, it immediately clicked with me. I was desperately looking for a way to build apps around a descriptive data model in Elixir. I used to be a big Ruby on Rails fan and still love the ease with which you can build complex apps with it, by being mostly descriptive. Ash gives me a big part of the same goodness and other kind of goodnesses, in a full Elixir spirit (more "strict/pure/perfectionist" than the Ruby way but still very pragmatic). the only thing that I still miss is good tooling for building web UIs. But as far as I understand, there are things in the making ?

zachdaniel commented 1 month ago

They are a ways out, but yes :)