Islandora / documentation

Contains islandora's documentation and main issue queue.
MIT License
104 stars 71 forks source link

Taxonomies populated during migration cannot contain term names longer than 255 characters #1017

Open mjordan opened 5 years ago

mjordan commented 5 years ago

Taxonomy term names cannot exceed 255 characters (see name field below):

mysql> describe taxonomy_term_field_data;
+------------------------------+------------------+------+-----+---------+-------+
| Field                        | Type             | Null | Key | Default | Extra |
+------------------------------+------------------+------+-----+---------+-------+
| tid                          | int(10) unsigned | NO   | PRI | NULL    |       |
| vid                          | varchar(32)      | NO   | MUL | NULL    |       |
| langcode                     | varchar(12)      | NO   | PRI | NULL    |       |
| status                       | tinyint(4)       | NO   | MUL | NULL    |       |
| name                         | varchar(255)     | NO   | MUL | NULL    |       |
| description__value           | longtext         | YES  |     | NULL    |       |
| description__format          | varchar(255)     | YES  |     | NULL    |       |
| weight                       | int(11)          | NO   |     | NULL    |       |
| changed                      | int(11)          | YES  |     | NULL    |       |
| default_langcode             | tinyint(4)       | NO   |     | NULL    |       |
| content_translation_source   | varchar(12)      | YES  |     | NULL    |       |
| content_translation_outdated | tinyint(4)       | YES  |     | NULL    |       |
| content_translation_uid      | int(10) unsigned | YES  | MUL | NULL    |       |
| content_translation_created  | int(11)          | YES  |     | NULL    |       |
+------------------------------+------------------+------+-----+---------+-------+
14 rows in set (0.00 sec)

The GUI form that is used to create terms probably enforces this constraint, but dynamically populated terms like those produced by the corporate, subject, geographic, and person migrations that are part of https://github.com/Islandora-Devops/migrate_7x_claw risk failing if the term "name" is longer than 255 characters. Here's an example of what happens:

 [error]  Drupal\Core\Database\DatabaseExceptionWrapper: SQLSTATE[22001]: String data, right truncated: 1406 Data too long for column 'name' at row 1: INSERT INTO {taxonomy_term_field_data} (tid, vid, langcode, status, name, description__value, description__format, weight, changed, default_langcode, content_translation_source, content_translation_outdated, content_translation_uid, content_translation_created) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6, :db_insert_placeholder_7, :db_insert_placeholder_8, :db_insert_placeholder_9, :db_insert_placeholder_10, :db_insert_placeholder_11, :db_insert_placeholder_12, :db_insert_placeholder_13); Array
(
    [:db_insert_placeholder_0] => 157
    [:db_insert_placeholder_1] => subject
    [:db_insert_placeholder_2] => en
    [:db_insert_placeholder_3] => 1
    [:db_insert_placeholder_4] => nd CCICED Policy Recommendations Impact  environmnetal, social and economic development. It also discusses the key policies that have had an impact in China's changes, especially with respect to recommendations such as environmental instruments and economic tools to control pollution, governance systems and legislation, including: ecological civilization system construction and Institutional innovation
    [:db_insert_placeholder_5] => 
    [:db_insert_placeholder_6] => 
    [:db_insert_placeholder_7] => 0
    [:db_insert_placeholder_8] => 1549293162
    [:db_insert_placeholder_9] => 1
    [:db_insert_placeholder_10] => und
    [:db_insert_placeholder_11] => 0
    [:db_insert_placeholder_12] => 0
    [:db_insert_placeholder_13] => 1549293162
)
 in Drupal\Core\Entity\Sql\SqlContentEntityStorage->saveToSharedTables() (line 930 of /var/www/html/drupal/web/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php). 

You would hope that names for people, things, and places as expressed in MODS documents out in the wild are not longer than 255 characters, but they probably exist! Since we're populating term names from uncontolled text values in XML, we need a strategy for dealing with long term names.

seth-shaw-unlv commented 5 years ago

I've used the title_length module for nodes with long titles. The module is relatively simple and could probably be adapted to extend the length of name fields.

Of course, you could still get really long values. If you are comfortable trimming the value to the max length, use the substr process plugin:

process:
  name:
    plugin: substr
    source: title // Or whatever your source field is
    start: 0
    length: 254

You can even log all of your name values...

process:
  name:
    -
      plugin: log
      source: title // Or whatever your source field is
    -
      plugin: substr
      start: 0
      length: 254

and come back after the fact to find any that were longer than 255 (and thus trimmed).

mjordan commented 5 years ago

@seth-shaw-unlv excellent, thanks for the suggestions. I assume this problem might not manifest itself for most 7.x instances, since having long values like those is probably an edge case. I know we've discussed some workflows that allow people working on migrations to do some sort of audit of their metadata prior to running the migration. Stuff like using the substr plugin are a good way to prevent errors from occurring but it would be good to document the problem and ways of preventing it from happening in the first place. @rosiel, @rtilla1 and other MIG groups, what do you think?

dannylamb commented 5 years ago

@mjordan Worst case scenario is we have field to hold the long text and then trim the actual name to 255. Sort of like how a node's body gets trimmed for the summary blurb. I wouldn't do it unless it were a serious issue, but at least there's a way for folks to keep their long titles and not upset the drupal gods.

mjordan commented 5 years ago

@dannylamb are you suggesting that we add that extra field to the node (multiple copies of it), or can we add it to the term itself (one copy of it)?

dannylamb commented 5 years ago

I'm saying stash it in an extra field on the term itself. Like field_long_name, and then we substr it to set name. All this would be done in one shot during the migration. There'd be more work if you want to auto-update it every time you change field_long_name though... So it would be more work if people are expecting that as a permanent feature. But nonetheless, at least you wouldn't lose your full title during the migration.

mjordan commented 5 years ago

Might be worth considering. Let's let the MIG folks weigh in to see what they think.

bryjbrown commented 5 years ago

I'm definitely in favor of having an extra field with more space in it for storing longer titles, but instead of using the standard required "title" field to be a truncated version, I'd suggest using that in a way that is the title of the node as opposed to the work that represents it, perhaps in the spirit of an ID or something.

This is what we are doing with our current 7.x submission process at FSU, "titles" are automatically generated using a pattern of "type_timestamp_hash", for instance "honorsthesis_1549124815_ab134724" and this title ends up getting reused as a unique ID in the resulting MODS record.

Having two different fields for the same title, with one being the "long title" might be confusing and appear as if its an alternate or variant title instead of the true title, while making the titles represent different things makes the difference clear.

rosiel commented 5 years ago

This was discussed at the Feb 6 CLAW call. @alxp proposed (forgive me if my paraphrasing is wrong) that a new field such as field_full_title be created to hold the actual titles/names, and the built-in fields be used rarely-if-at-all due to that restriction. I think the justification for not hacking the core Drupal tables was to make future upgrades safer.

mjordan commented 5 years ago

Just practicing some migrations using Islandora 8 1.0.0 and hit this again. I'm not sure how common it is in the wild to have corporate names, for example, that are longer than 255 but when it occurs, migrations stop dead in their tracks.

seth-shaw-unlv commented 5 years ago

@mjordan using the trim method I mention above will keep the migration moving, however, there is something to be said for letting the migration die to emphasize you have an issue to address.

mjordan commented 5 years ago

@seth-shaw-unlv I have to respectfully disagree with this. I do not want my migration of 100,00 objects to fail at object 99,999 because that one has an errant source metadata value. YYMV but I'd rather have that target object created but the problem logged. We could have it both ways too - a strict mode and a permissive mode.

seth-shaw-unlv commented 5 years ago

@mjordan I don't think we necessarily disagree; I use the trim method myself because I certainly don't want my migrations failing part way through. I just leave open the possibility that someone may prefer the hard fail. Documentation of options and implications is key, as usual.

mjordan commented 5 years ago

Gotcha. Different people will want to take different approaches.

My personal preference would be to use a tool like that to find potential problems so they can be addressed prior to a migration. Over in #1021 I attempted to develop a tool to allow for practice migrations. This worked for some fields types, but not for taxonomies since they are referenced and therefore must exist.