atk4 / data

Data Access PHP Framework for SQL & high-latency databases
https://atk4-data.readthedocs.io
MIT License
273 stars 46 forks source link

Standard types overhaul #308

Closed romaninsh closed 3 years ago

romaninsh commented 6 years ago

The current fields types are not sufficiently flexible and not contain enough information and options for field use cases.

After #307 is done I look to break down "Field" class into multiple ones focus on various parameters where user can define types.

The definition would remain as before, however based on type, presence of "enum" and "values" properties an appropriate field class would be used, so making this possible:

$model->addField('age', ['type'=>'integer', 'min'=>14]);
$model->addField('height', ['type'=>'length', 'units'=>['m','cm']]);

Here are some of the use-cases that new implementation should support:

Field\Boolean:

Field\List:

Field\Date:

Field\Number:

Field\Measurement:

PLEASE SUGGEST MORE TYPES IN THE COMMENTS.

DarkSide666 commented 6 years ago

Field\Location - Google location (latitude, longitude). Field\RichText - full HTML encoded text, maybe can more properties than just string type field.

Overall I like this approach, but you have to be very careful not to step to deep into UI (fields) related stuff here. Data should not care about UI at all. For that we only have one property for all fields $ui and UI class should take care about all visualization itself.

romaninsh commented 6 years ago

Good suggestions, but typically any fields that rely on 3rd party libs should be in their add-ons. (due to dependencies)

I think we can start introducing fields as per proposed plan, can be one field at a time. We can start with measurement, for example, because it has not support currently.

FrancessFractal commented 6 years ago

Hi! I'm totally new to the library, starting today to tinker with it to see if we want to use it, and was searching for how to do something and spotted this post... Seems closely related, so I'll just ask here and if you know how to do it you'll save me from having to hunt through the docs, and you also get an idea for a new type out of it :)

We use uuids and are storing them in our MySQL db as binary. I need to pull these out as a hex string. I see you've got the base64 algorithm which seems to be working fine, but nothing yet for hex. I'm trying to do custom serialization but the docs on that are kind of minimal. The decode function is receiving a Field_SQL object, but I'm not really sure how to grab the binary data from that to convert into hex... I was trying to figure out how to do that when I spotted this issue (any by extension #307 ). What's the best way to go about this? Especially, in such a way that it'll require minimal fuss when you 'get rid of Field_SQL'?

romaninsh commented 6 years ago

Hey @FrancessFractal and welcome !

The relevant documentation is here: http://agile-data.readthedocs.io/en/develop/typecasting.html#supported-algorithms

p.s. A better place to ask general questions would be in our forum (https://forum.agiletoolkit.org).

romaninsh commented 6 years ago

There are already steps to get rid of Field_SQL, for example this add-on uses a custom field:

https://github.com/atk4/login/blob/master/src/Field/Password.php

It does not extend "Field_SQL" but it can be saved into SQL persistences without worry, so if you are creating field simply extend Field to be on a safe side. "Field_SQL" will remain declared as a class but will be obsolete/empty after this ticket is fixed. Also all the existing serialization rules will be preserved and no functionality would be lost.

FrancessFractal commented 6 years ago

Thanks @romaninsh ! Unfortunately, that was the part of the docs I had already found, and it didn't help too much... I guess I'll just have to do more digging! Thanks for the tip regarding the forum. I'll definitely ask there if I don't figure it out quickly on my own.

Perhaps I wasn't too clear, though. The reason I asked here at all was because I figured it doubled as a suggestion for some sort of uuid type. Again, I'm quite new to this library, so maybe I'm misunderstanding the whole issue, but it sounds like exactly the type of thing you're asking for. I mean, it's pretty widely used, and because it has to be represented differently in different databases it's exactly the type of thing you'd hope your data layer could handle for you. Postgres supports uuid natively, while in MySQL you'd probably store it as a binary, for example. It's an open standard, and simple enough that it wouldn't require any third party libraries. And, like dates and numbers, there's a few different formats you might want it output in. For instance, there's the standard format with the hyphens but it's sometimes used without them. Also, for updates, you might want to support validation for uuid version/variant, a bit like min/max for number.

I imagine I'm not the only one who would find it massively useful. Assuming I'm understanding the topic correctly, it seems like it's something you should consider.

FrancessFractal commented 6 years ago

Overall I like this approach, but you have to be very careful not to step to deep into UI (fields) related stuff here. Data should not care about UI at all. For that we only have one property for all fields $ui and UI class should take care about all visualization itself.

Agreed on this... I was surprised and slightly concerned to see UI mentioned here...

romaninsh commented 6 years ago

@FrancessFractal your comment wasn't entirely miss-placed. In fact, I appreciate you commenting :)

I was a bit confused by why you suggested to store them as binary, they appear like a regular strings. So if I ignore your mention of "binary" then that's what I think:

$model->addField('uuid');

After the refactor we can make uuid type available like this:

$model->addField('uuid', ['type'=>'uuid']);

The type would get convert into a Field Seed. This is now how types are converted to Table Column decorators (UI) and FormFields (UI): https://github.com/atk4/ui/blob/develop/src/Form.php#L327, so I imagine something similar would happen here:

'uuid'=>['Number', 'base'=>16, 'format'=>'########-####-####-####-############'],

This should also work with credit card numbers (if anyone is brave enough to store them), account numbers, phone numbers etc, although since there wouldn't be any standard, user would have to define that explicitly:

// UK / Ireland sort code:
$model->add('sort_code', ['Number', 'format'=>'##-##-##']);

The validation could try to match against the format, although support situations where separators are missed '081245' and will automatically normalize it into '08-12-45' assuming that there are enough digits.

The field and ATK Data try to not get involved into validation (it can be integrated though), so the validation exception would only be raised if format cannot be normalized.

FrancessFractal commented 6 years ago

Actually, BINARY(16) is generally accepted as the most efficient way to store a uuid. Postgres' uuid data type, for instance, is stored as binary and then converted to a string as its return value. If you store it as a CHAR(32) (or CHAR(36) if you store the hyphens as well) it's gonna take up significantly more space. It's pretty trivial to convert it into the string format, too. If I'm just writing up some SQL in the console I'd do something like this:

SELECT HEX(uuid) as uuid FROM table1

We don't use the standard format with the hyphens, and we don't care if it's upper or lower case, but if we wanted to those would be some pretty simple string operations. The operation could also easily happen in php, using hex2bin and bin2hex. So, based on the example in the docs: http://agile-data.readthedocs.io/en/develop/typecasting.html#storing-unsupported-types I tried the following code. That throws an exception, so I guess I'm misunderstanding how the custom serialization functions are supposed to work. I think I need to read up more on the Field class...

$this->addField('uuid', [ 'serialize' => [ function ($hex) { return hex2bin($hex); }, function ($binary) { return bin2hex($binary); } ] ]);

Anyway, that's how the datatype works using binary as the storage type. I suppose it might be a good idea to somehow support CHAR(32) and CHAR(36) as well, in the interest of supporting databases that weren't designed by someone with efficiency in mind.

And, fyi, we have no use for the ATK UI library, so anything regarding that will be somewhat irrelevant to me (although probably worth mentioning for the benefit of anyone who stumbles into this conversation who is using it). Our use case is more geared toward using this as kind of an enhanced query builder. So I'm definitely hoping to see very careful separation between the ui and data layers!

I do think there's a case for some level of 'validation' tied into the data layer, though. Basically, the same reasons you'd use db constraints, except that it would be defined in the codebase instead of the db itself, so that it would more easily carry over if you switched persistences. Looks like you have that with min/max numbers. For uuid it might not be a bad idea to add version and variant as constraints. I'm not sure it would be widely used, but it probably wouldn't be particularly hard to implement, and it can't hurt to have it!

Generally, I suggest you look around at various different SQL flavors and see what data types they natively support, and see if you can find a way to unify as many of them as you can in this overhaul. One of the big issues you run into when first creating your db is making it so you could easily switch engines if you ever needed to. You basically have to choose between sacrificing some seriously useful functionality or accepting that your project will forever be locked into a particular SQL implementation. For this reason, most ORMs seem to go with only supporting features that exist in all SQL flavors (not only in terms of data types, but also in terms of functionality, forever keeping us from using recursive CTEs in our projects /sad). Unifying even just some of the more useful data types would seriously help take this project to the next level. That combined with something like #111 would be pretty groundbreaking...

Definitely easier said than done! After all, even with a data type as straightforward as uuid, we've already identified 3 different ways people have attempted to store them (5 if you account for upper/lower case). Still, even a very minimal attempt at this would make this project far more interesting than any amount of support for units or configurable formatting ever possibly could.

FrancessFractal commented 6 years ago

Of note, we would be particularly interested in some sort of support for https://en.wikipedia.org/wiki/Well-known_text (kind of tied to @DarkSide666 's suggestion for 'Google location')

We're MySQL's implementation of it in our db for geolocation, and we're absolutely going to need (at some point) a way to filter data based on distance between two points, and ordering by distance. For instance, if we know a user's location and we want to find the n closest objects to them. Or to find all objects within m km. I'm looking into how I might do this using Agile Data, but if you're considering adding native support for it, that would be pretty exciting.

romaninsh commented 6 years ago

Geospatial is something that I have forgotten to mention entirely. Thanks for bringing it up.

To start things off, I can get out the logic for the basic types. The UUID would be also implemented using the as-is format which I described. But I'll also make sure that using serialization it would be quite easy to store it as a binary.

There would also be a mechanism to bring your own UUID field implementation, which would then offer more - version validation, etc.

Regarding "GeoSpatial" - I don't have any experience working with those. Certainly important, but would it be different if it's in the "atk4/data" repository or in "atk4/geo" ?

My effort and involvement is primarily focused around building ATK Data as a vehicle for UI and offering an cloud based app building service. Having a successful commercial product is a best way to ensure my involvement. I do understand that there are lots of developers who use ATK Data without UI and we always manage to keep things separate. When things start to look blurry (like with the password field or Audit), we moved it into an add-on.

romaninsh commented 6 years ago

I have created #314 and #315 just to start somewhere. I'll work on them a bit later.

FrancessFractal commented 6 years ago

Awesome! I'll keep an eye on those threads to see how things progress. For now I'll keep tinkering to see if I can get stuff working with what we already have and get a better understanding of how well ATK Data would work for us in the long run if we went with it. There's definitely some risk in going with a relatively new, not-very-mainstream project like this, but so far it's definitely reassuring that you've been so responsive to my ideas and concerns :) I'll be sure to let you know if I run into any major concerns as I continue my tinkering! I guess the best place for that would be the forums, unless it ties into the data types stuff we've been discussing.

As long as it's straightforward to add the geospatial add-on, I don't think it would be a big deal if it was in some other repository rather than built into this project. On the other hand, if it's a data type well-supported by most major SQL implementations (I haven't done my research on this), it may make sense for it to be considered a basic data type rather than an extension. I don't know this project well enough to know where it would fit better, and I don't have a strong preference either way, as long as there's an easy way to get it working without resorting to dirty hacks.

romaninsh commented 6 years ago

We have quite a few clients through paid support. Our rule is that we are allowed to contribute code we produce back to the open-source community. As I mentioned - i also have some long-term plan building services for developer community and even if it's not applicable for everyone it makes it important to have a well-polished product and a healthy community. See you in the forums and I hope you find this project useful and keep providing feedback on how to improve.

romaninsh commented 6 years ago

Reviewed https://laravel-backpack.readme.io/v3.0/docs/crud-fields to see if we can get some fresh ideas:

romaninsh commented 6 years ago

Another list of field types, although shorter: https://www.grocerycrud.com/documentation/options_functions/field_type

romaninsh commented 6 years ago

http://docs.doctrine-project.org/projects/doctrine-dbal/en/latest/reference/types.html

This is what we reviewed previously to arrive at the current types. Here are some new notes:

romaninsh commented 6 years ago

Apple Numbers

romaninsh commented 6 years ago

NOTE: I've looked at various systems, frameworks and products that work with data entry and have ways to define data. Although it is not a responsibility for Agile Data to display the format, it should be able to define/store configured format as well as convert/normalize data as required. More reviews of 3rd party products are welcome!

romaninsh commented 6 years ago

@FrancessFractal here is some docs i have put together:

https://github.com/atk4/data/blob/099ce5ddd5c7712a97f3fa6708f85f287542fcd1/docs/types.rst

Let me know if this makes sense.

romaninsh commented 5 years ago

For external libraries https://github.com/PhpUnitsOfMeasure/php-units-of-measure https://github.com/moneyphp/money https://github.com/briannesbitt/Carbon

Boolean is ready: https://github.com/atk4/data/pull/314

Number is almost ready: https://github.com/atk4/data/pull/375

mvorisek commented 3 years ago

Can be solved with https://github.com/atk4/data/pull/727 or Symfony/Lavarel integration