berlindb / core

All of the required core code
MIT License
252 stars 27 forks source link

Unable to save null values in numeric columns #76

Closed ashleyfae closed 3 years ago

ashleyfae commented 3 years ago

Related: https://github.com/berlindb/core/issues/22

The decimal and numeric validations do not allow for saving null values.

validate_decimal converts it to 0 https://github.com/berlindb/core/blob/master/column.php#L784-L786

and the default validation for integer types is intval, which would also convert it to 0 https://github.com/berlindb/core/blob/master/column.php#L727

Ideally both would allow saving null values if the column allows it.

JJJ commented 3 years ago

This was an intentional design decision in BerlinDB, and I'm reluctant to change it, similar to other null changes that have introduced some confusion in EDD3 previously.

The reason BerlinDB was designed this way, was specifically to avoid needing to juggle types in PHP, and to avoid needing to always recheck every type throughout different areas of the application. Offload that to the database middleware, and you can avoid a thousand "I'm not sure if this is null or 0 or ''" type checks.

In addition, like you've identified, WordPress has helpers for validating certain types of values, and they conveniently convert invalid values into their valid equivalents. Supporting null default values requires rewriting all of these functions.

Lastly, I think that null values require special MySQL statements to include or exclude them from queries, using NOT EXISTS in addition to the default empty value of that column type. If that's true, I don't think supporting null will be less confusing (but I could be wrong; worth investigating).

Can you help me understand why null is ever preferable over the empty equivalent for that column type?

ashleyfae commented 3 years ago

Can you help me understand why null is ever preferable over the empty equivalent for that column type?

Because when working with integers, 0 can be a legitimate value, whereas null always designates there is no value.

It doesn't make sense to me that BerlinDB would have the $allow_null property at all, if, by design, it never allows null values to be set in some cases. :woman_shrugging:

JJJ commented 3 years ago

Because when working with integers, 0 can be a legitimate value, whereas null always designates there is no value.

I understand the difference between 0 and null. It's not very nice to assume otherwise. 😕

You also haven't answered my question: why is it preferable? Show me a use case? Walk be through an example? Why would your PHP application need to treat 0 differently than null, and is that the best design for that software and the people who support it?

It doesn't make sense to me that BerlinDB would have the $allow_null property at all, if, by design, it never allows null values to be set in some cases. 🤷‍♀️

Allow Null is a built-in attribute of a MySQL database table. BerlinDB included it as courtesy for completeness, but it is/was unused, and I'm sorry for not remembering why in the context of datetimes and #22. It was never intended to be used like this - it's an anti-pattern that negatively affects the meta-data tables in WordPress to this day. #22 is the only instance of allow_null being used, and I think looking back that that was probably incorrect to do.

ashleyfae commented 3 years ago

I understand the difference between 0 and null. It's not very nice to assume otherwise.

I'm sorry, I wasn't trying to be mean. I think I didn't realize you were looking for a detailed example.

Here's a sample situation where it would be used:

You have a database table for global settings. You have another table for products.

There's a global setting allowing you to specify a File Download Limit. This is 0 for unlimited, or any other positive integer.

This setting can be overridden on a per-product basis (products table). If it's not overridden for a product, the value is null -- because the File Download Limit is not set at all for that product, thus meaning you'd fall back to the global setting. However, you can override the value on that product. You can override it to be 0, which would then change the column on the products table from null (not set / not overridden) to 0 (overridden to be 0 instead of the global default).

That gives 0 and null two different meanings. In one case it's explicitly set to a 0 value (which in this example means "unlimited"). In the other case it explicitly has no value at all.

ashleyfae commented 3 years ago

That's actually an exact description of a use case I'm planning on using for Software Licensing with regards to activation limits.

null = use global default any other number = this has been overridden on a per-license basis

pippinsplugins commented 3 years ago

Storing null values, or rather understanding explicitly set 0 values vs non-explicitly set empty values is something we had to deal with a lot in AffiliateWP. @DrewAPicture's perspective would be valuable here.

JJJ commented 3 years ago

That's really helpful, thanks!

If I understand correctly, Berlin not allowing null default values (or in general) is, in fact, intentionally preventing you from doing what you're describing.

It's not an uncommon approach (and I know Pippin has used it in the past) especially in WordPress where we all have a tendency to cascade values from multiple sources until we settle on the most appropriate one.

In this case, it isn't part of the plan for an Activation Limit to literally be 0, so you end up with a value that is convenient to repurpose, which is tempting, but I would caution against it, because it is covert and clever.

0, as you know, is literally nothing - none - or essentially "off" in this instance. But you are trying to reuse it to mean "unlimited" instead.

It would be as if:

If 0 meant 0 then in the case of Activation Limits, 0 should be "inactive" and not "unlimited".

(Sorry if this sounds pedantic; I'm not trying to be patronizing, only hoping to be helpful, and highlight flaws in the application side that I hope Berlin can prevent in the data side, hoping to influence improving the application.)

Without telling you (or anyone out there reading this) how to do your jobs, here is one way I would design a database schema to avoid falling into the above trap as I see it.

3 columns:


// status
array(
    'name'       => 'activation_limit_status',
    'type'       => 'varchar',
    'length'     => '20',
    'default'    => 'default'
),

// limit
array(
    'name'       => 'activation_limit',
    'type'       => 'bigint',
    'length'     => '20',
    'unsigned'   => true,
    'default'    => '0'
),

// count
array(
    'name'       => 'activation_count',
    'type'       => 'bigint',
    'length'     => '20',
    'unsigned'   => true,
    'default'    => '0',
    'sortable'   => true
),

In this way, you have full control over and knowledge of all of the qualities of Activations relative to the Software License:

This approach somewhat mimics what WordPress Core does with Posts & Comments, respectively. Comments on posts can be open or closed, there is a default_comment_status setting that can be open or closed, and there is logic in PHP to determine whether Comments are open or closed on a per-Post + Default Setting basis.

This way, even though the "comment count" could be easily derived from a query to the wp_comments database table, it's still maintained in wp_posts, because it is helpful to avoid recounting them every time it is needed, especially for things like calculating pagination where the maximum value has some other use, like preventing future Activations in your case.

WordPress, though, includes some more internal logic to open or close the Comments gate based on other factors (14 days, for example.)

With an approach similar to the one I'm imagining, you're also able to make the UI for the Activation Limit much more clear. You can ditch the "enter 0 for unlimited or leave empty for default" descriptive text, for two separate but obvious settings for the Status and the Limit. And you get to separate the double-duties of your values apart.


Ultimately, this isn't less work, it doesn't take up less memory, and it doesn't use less disk space - I know that. But... it does result in an easier to understand application, without being overly abstracted to the point of being unclear. It avoids clever logic that developers would need to learn how to interact with. It prevents end-users of your software from being directly exposed to the internal 0 vs. null repurposing. And it does not involve changes to Berlin.

What do you think?

ashleyfae commented 3 years ago

SL already uses 0 as unlimited, as do a variety of other things in the EDD ecosystem. Whether or not it should be is a different discussion. But given that it is, it feels like the options are basically:

  1. Completely rewrite all the operating logic in a not-very-backwards-compatible way to avoid this; or
  2. Don't use BerlinDB for a chance to keep things as they are.

But even if we go with point 1... my main point is that I find it odd for BerlinDB to take a stance on what should and should not be stored in the database. :woman_shrugging: The PR to make things work is quite small and simple; it's not a big rewrite of what BerlinDB is or the functions it uses. I'm not even saying we introduce IS NULL and IS NOT NULL type queries. Just the option of saving them is nice.

JJJ commented 3 years ago

Storing null values, or rather understanding explicitly set 0 values vs non-explicitly set empty values is something we had to deal with a lot in AffiliateWP

I searched for null in the AffiliateWP issues, and the work on the issues I found was similar but tangential.

The database columns in the AffiliateWP database tables all use the NOT NULL constraint, which means none of them allow for null values in them. Somewhat oddly, none of the columns there have DEFAULT values. That's not a bad thing, but it does mean that every INSERT needs to define every value, and nothing in the database tables is ever automatically set by MySQL. Frankly, this is probably great in terms of maintaining strict adherence.

But in EDD3, almost every column has a default value, which allows those INSERTs to only include values that are non-defaults if one would so choose, and MySQL will automatically set the others. A few columns in EDD3 do not use NOT NULL that probably should, that also subsequently use DEFAULT NULL that should not:

It's possible there is a technical reason for this, such as backwards compatibility with existing log entries or adjustments, but I can't recall at this time if that is the case or not.

(I'm also intentionally omitting changes made in EDD3 to the datetime column types, because I still believe it was not the right thing to do, and it's not directly related to this issue, and I don't want to bring up old issues where I'm in disagreement.)

my main point is that I find it odd for BerlinDB to take a stance on what should and should not be stored in the database

One of the primary things that middleware like BerlinDB does, is take stances on things so that common mistakes can be avoided and desirable patterns and code-paths can be enforced. Applications like RCP, AffWP, and EDD are simply abstractions of our collective opinions and experiences into repeatable logic that we enforce onto others so that they do not need to make all of those decisions on their own.

I'm not even saying we introduce IS NULL and IS NOT NULL type queries. Just the option of saving them is nice.

I think if Berlin ends up needing to support null values (which it may, to better support the WordPress meta-data table designs 🤮), support for IS NULL and IS NOT NULL query arguments needs to be added at the same time (though I'd prefer to not mess about with ANSI_NULLS). I'd want to be confident about whether EXISTS and NOT EXISTS queries can adequately work in favor of them.

it's not a big rewrite of what BerlinDB is or the functions it uses

I believe that assessment is incorrect. It's not small or easy. #22 is already hacked in and incomplete as it was, and it will require more than that level of effort to support correctly.


As far as your 1 and 2 options go, I don't really think that's fair. "Look what you made me do" isn't the kind of conversation I'm trying to have here or the position I'm putting you in. I'm as invested in EDD's success as anyone else is. I think taking your option 1 approach in EDD would be easier than adding full support for null values in Berlin.

And remember that nothing else intentionally uses null values in the database except for meta-data tables (and your datetime changes). They will cause type-casting problems in the future on the PHP side. It is not a good idea to add support for them. I'd strongly prefer not to hack this in and clean-it-up later, because later never comes.

And... perhaps selfishly, the bugs from adding it will be mine to fix afterwards - I do not have the bandwidth to add something new that I know will cause problems until 2020 is over. That isn't influencing my recommendation, but I'm mentioning it so that anyone reading has an idea of a timeline for me to do the work.

DrewAPicture commented 3 years ago

Sorry, I inadvertently posted some comments on the closed issue #22 instead of here.

Having since reflected a bit and read some of discussion here, I think my feedback might be a little more pointed.

In some ways, I see some parallels between this and the MySQL views conversation in AffiliateWP. We're leveraging a MySQL view to express campaign data that is live-calculated from two database tables.

We're (we think) using this for a very valid purpose, and in spite of this, many hosts refuse to support MySQL views (either in their syncing tools or just disallowing them altogether) because they don't think it's a wise design decision.

I think the open web, such as we know it, only works together well when everyone is willing to play some measure of "ball". All that to say that I can certainly see the argument in favor of "just let me store what I want, it's my problem to handle for it."

Berlin is a great for standardizing the creation of and interaction with the database layer, but I don't believe it should be in the business of deciding which MySQL features are available or not. As an OSS maintainer, you are certainly under no obligation to actively encourage doing things you think will make the developer experience harder, just maybe don't stand in the way if that's the way somebody wants to go.

ashleyfae commented 3 years ago

Whether you use 0 or null to designate "this is empty" is just a personal approach. Sure, you can treat them the same way in PHP by using empty() but in my mind, they have different meanings behind them (previously mentioned comment about 0 technically being a value). I know I don't have to use null; I just like it for the meaning behind it.

Yes, I can always find a different way to approach my database design to avoid using null, but I suppose my point is that I don't feel like I should have to because the DB engine disagrees with that stylistic approach in case the developer gets sloppy in PHP or with their queries. I think making nulls opt-in, as they are now, is the right approach, which means the developer should know what they're doing with null values if they decide to use them.

it's not a big rewrite of what BerlinDB is or the functions it uses

I believe that assessment is incorrect. It's not small or easy. #22 is already hacked in and incomplete as it was, and it will require more than that level of effort to support correctly.

I'm referring to the PR for this issue, which is already done and should be all that's needed to allow a null value to be saved, which is all that's being asked. I think this PR is quite simple. :woman_shrugging:


Another issue is that converting null to 0 is actually changing the data I'm inserting. This really dives into a different issue of validation vs sanitization. But given that null is valid in MySQL, and I can designate it as accepted in my schema (both the PHP array and the actual table creation statement), I expect that if I pass that in to be inserted, it's going to get inserted as null. I should get that back. But it gets converted very quietly without my knowledge.

If BerlinDB wants to take a stance against nulls, then I think the only approach is to reject that value completely (as in, the insert fails with "invalid data type" or something). That way the developer isn't inserting nulls, thinking they'll work fine, and later discovering they've all been quietly converted to a different value. This, in my opinion, is a bigger issue and more dangerous because the developer literally expects one value to be saved and a different one is saved instead. This change is actually what could result in bad comparison logic in PHP, because the data does not match what was attempted to put in. There's no error about that. (I still disagree with BerlinDB taking such a stance, but if one should be taken, then IMO that's how it needs to be done.)

JJJ commented 3 years ago

Pull request ready for review.

I stand by all of my previous comments. This is not a good idea. It is a loaded gun and someone will get hurt by it.

Strong beliefs weakly held, I guess.