Stanford / CAP_drupal

5 stars 5 forks source link

CAP field names not unique enough #27

Open sherakama opened 10 years ago

sherakama commented 10 years ago

Here is the scenario.

I installed a new Drupal site and set up the CAP module to import just the primary contact information. From that content type I was able to create a few views using the phone and email fields. I exported those views into a feature and was happy. Second time around I installed the site fresh with the CAP module and imported both the primary and alternative contact information fields. When I enabled the feature with the exported views the views fields were showing the wrong information. Instead of showing the primary email address and phone number it was showing the alternative email and phone number.

Digging into this issue it looks like the machine names for these fields can change depending on which fields are selected to be imported. In the example above the field names for the alternative and primary contact information are cap_email and cap_email_2. Email shows up in multiple places in all of the available fields. This could be a huge problem if the schema changes and adds an email field somewhere before or in the middle of any exported feature using the cap profile content type. This also creates an incompatibility across sites using the CAP module.

My expected behaviour would be that the primary contact email field would be named consistently independent of which fields are being selected for import.

mpriest commented 10 years ago

I think the correct approach to fixing this issue is a field mapping registry. In this way, a field created in Drupal would be forever linked directly to the field defined in the CAP schema. E.g. clinicalContacts.email => field_cap_email maintainers.email => field_cap_email_2 postdoctoralAdvisees.email => field_cap_email_3

Then for any new field found in the schema, a new field is created and added to the mapping with an appropriate increment (field_cap_email_4). This would also work the reverse way (the way the bug was reported), when selecting more fields to import. The registry would provide a canonical field mapping reference for a given Drupal installation.

sherakama commented 10 years ago

Thanks for the possible solution. This would help greatly with the exportability of configuration and settings. A side effect would be that the module would have to have hard coded knowledge about the schema and this doesn't pair nicely with the awesome automagic schema update functionality. I doubt the schema is going to change often or dramatically but theoretically it would be possible for two fields of the same name to be added to the schema at one time and two sites each syncing one.

I think I would rather see the field name be derived from the place where it resides in the json object. There is a unique and logical path to each field. For example,

clinicalContacts.email => field_cap_clinicalcontacts_email maintainers.email => field_cap_maintainers_email

For those with names too long we could take the views_block approach and just hash the field name.

eg: some.really.long.path.that.istoolongfordrupal => field_cap_some_really_long_path_this_istoolongfordrupal => field_ajsf0d09fsda0sdfasfd0aljkalsdkfoea

Thoughts on this?

zchandler commented 10 years ago

First, @sherakama, mad propz for using the cap infix in your example! (So much win!)

The only thing that bothers me about the hashed field name is that we drop our Hungarian notation, and make it impossible to know what it is by looking at the machine name. I think that could be hard on sitebuilders that are making choices in custom Views down the road.

Can we make a <32 char string that drops some of the elements of paths.that.are.too.long, but keeps a kernel of meaning?

sherakama commented 10 years ago

I agree with @zchandler about the machine names having human readable context. That is where this approach breaks down. As humans we could make logical choices on where to chop and cut off portions of the string. Drupal is however, not yet in the realm of that sort of thinking. It would be very difficult to ensure uniqueness when chopping off portions of strings. @mpriest's approach would be a stronger choice if we would try to go down the string chopping route.

mpriest commented 10 years ago

field_cap_some_really_long_path_this_istoolongfordrupal => field_ajsf0d09fsda0sdfasfd0aljkalsdkfoea

This was the original approach... we moved away from it because too many fields fall into the category of being too long, so we went with the "numbered duplicates" approach.

Would be nice if Drupal supported longer machine names, but we can't easily get around this limitation. I thought the reason for the limitation is a mysql index based on field_name, entity_type and bundle on the field_config_instance table. However, looking at the numbers... field_name varchar(32) entity_type varchar(32) bundle varchar(128) At first, this didn't make sense to me. I.e. mysql index limit is 1000 bytes... mysql supports utf-8 (multibyte strings) up to 3 bytes long, so the effective index length limit is 333 characters. The actual reason is related to mysql table name length limits of 64 chars - {entity_type}_revision_field{field_name}. In fact, Drupal can exceed this table length limit now. 32(entity type)+12(_revision__)+32(field name) = 76 characters. The same problem exists in D8, but there's some protection in place which uses hashed table names for cases where table names exceed the limit (which also allows 16 char padding for database prefixes).

A registry has problems and complications, but maybe it's good enough if we're clever with how we export/import features. Needs investigation.

techsoldaten commented 10 years ago

I vote for making a field registry part of CAP API v2. It's sorely needed, not just for the CAP module, but for anything that is going to be integrating with CAP beyond a simple API call to display data.

This is something Trellon pushed for when building the module, but the idea may have been a little too ambitious for it's time. At one point, we were did have a data dictionary set up for labels for field names.

It would not be too hard to do the same for machine names.