BiologicalRecordsCentre / iRecord

Repository to store and track enhancements, issues and tasks regarding the iRecord website.
http://irecord.org.uk
2 stars 1 forks source link

UKSI update August 2019 #636

Closed DavidRoy closed 5 years ago

DavidRoy commented 5 years ago

Full update of UKSI using existing scripts.

Step0: test on local warehouse instance (RJB updated) Step1: test via dev warehouse/irecord Step2: implement on warehouse1 and live irecord

burkmarr commented 5 years ago

@johnvanbreda - what is the purpose of the uksi schema in the Warehouse DB?

burkmarr commented 5 years ago

@johnvanbreda - I'm testing this first on a local warehouse instance from data files generated from a UKSI Access DB supplied to me today by Chris Raper (as per the Readthedocs instructions). This warehouse already has a UKSI species list generated from the data included in the GitHub UKSI support files - so I'm updating that list.

All progressed well until script 35 when I got this error:

35 - Tidy up orphaned records Error in script: ERROR: update or delete on table "taxon_meanings" violates foreign key constraint "fk_taxon_meaning_id" on table "cache_taxon_paths" DETAIL: Key (id)=(95008) is still referenced from table "cache_taxon_paths".

johnvanbreda commented 5 years ago

Cache_taxon_paths is a relatively new table so probably not handled by the script, hence the FK violation. I suspect the best thing to do is to update the taxon_meaning_id field in this table to map it to the new values (using the same approach as cache_taxon_searchterms on line 85).

johnvanbreda commented 5 years ago

@burkmarr The uksi schema holds the tables which are imported from the CSV files you extracted from Access. It's basically just an interim store of the UKSI data and is only used during the import process (though can be handy for diagnosing issues after).

burkmarr commented 5 years ago

@johnvanbreda - the cache_taxon_paths table doesn't have a taxa_taxon_list_id column, so I can't see how to use the same approach as the update of cache_taxon_searchterms (line 85).

johnvanbreda commented 5 years ago

@burkmarr The query on line 85 doesn't use taxa_taxon_list_id, it uses taxon_meaning_id (https://github.com/Indicia-Team/support_files/blob/master/UKSI/scripts/35%20-%20cleanup.sql#L85). So this should be OK shouldn't it?

burkmarr commented 5 years ago

@johnvanbreda I'm still not getting this. It updates taxon_meaning_id from the value in the taxa_taxon_lists table where taxa_taxon_list key matches the value of taxa_taxon_list_id in cache_taxon_searchterms (and the values are not already the same). But I can't use this where clause with cache_taxon_paths. How else can I pick out the correct row?

UPDATE cache_taxon_searchterms cts SET taxon_meaning_id=ttl.taxon_meaning_id FROM taxa_taxon_lists ttl WHERE ttl.id=cts.taxa_taxon_list_id !!!this line!!! AND cts.taxon_meaning_id<>ttl.taxon_meaning_id;

johnvanbreda commented 5 years ago

Sorry, yes, you are right - my bad. Cache_taxon_searchterms is not such a good example then, but you could take one of the other tables (e.g. taxon_codes) as an example so you end up with something like:

UPDATE cache_taxon_paths ctp
SET taxon_meaning_id=nc.new_taxon_meaning_id
FROM uksi.preferred_name_changes nc
WHERE nc.old_taxon_meaning_id=ctp.taxon_meaning_id
AND ctp.taxon_meaning_id<>nc.new_taxon_meaning_id;
burkmarr commented 5 years ago

Thanks @johnvanbreda. Well I used that rather than something like it since I couldn't improve on it! First off it was falling over because of a key violation on cache_taxon_paths but I deleted the UKSI schema and ran it again and all the scripts went through.

Chris Raper is going to provide us with a new update shortly, so I will run through this all again with the new update and modified script on my local installation. If that works, then we can move on to the dev warehouse and give it a go there.

burkmarr commented 5 years ago

@johnvanbreda - Chris supplied a new database, but running through the update procedure again on my local warehouse with the new UKSI again gave this error in the script 35:

Error in script: ERROR: duplicate key value violates unique constraint "pk_cache_taxon_paths" DETAIL: Key (taxon_meaning_id, taxon_list_id)=(29572, 1) already exists.

Running through the SQL one command at a time in pgAdmin shows that its the new update command to cache_taxon_paths that's causing the error:

UPDATE cache_taxon_paths ctp 
SET taxon_meaning_id=nc.new_taxon_meaning_id 
FROM uksi.preferred_name_changes nc 
WHERE nc.old_taxon_meaning_id=ctp.taxon_meaning_id 
AND ctp.taxon_meaning_id<>nc.new_taxon_meaning_id;

These are screenshots showing the relevant records... From uksi.preferred_name_changes where new_taxon_meaning_id=29572: GetImage (1) From cache_taxon_paths where taxon_meaning_id=taxon_meaning_id=298 (old_taxon_meaning_id being updated): GetImage (2) From cache_taxon_paths where taxon_meaning_id=taxon_meaning_id=29572(existing record triggering constraint violation): GetImage (3)

Does this look like a problem with the data in the update rather than our scripts? It looks like a taxon_meaning_id value that we already have in the database is being used for a new taxon_meaning.

johnvanbreda commented 5 years ago

@burkmarr the taxon_meaning_id is an internally generated ID (the next sequence from the taxon_meanings table) and there is supposed to be 1 per unique taxon concept. So it's more likely a script issue than a data one.

I noticed that in script 5 there are some updates to the cache_* tables to fix "broken" taxon meaning IDs. These changes don't include the cache_taxon_paths table as it is fairly new, so may need to be updated, but having said that it looks like they are an attempt to fix a one-off problem so unlikely to be the cause here.

I also note that the other cache_* tables updated in a similar fashion don't have a unique index or primary key on taxon_meaning_id, whereas cache_taxon_paths does, hence we can expect this problem to be a new one that only affects cache taxon paths.

On script 35, line 39 there is a query that works out where there has been a change of preferred name what the mapping from old taxon meaning ID to new taxon meaning ID should be. Now, imagine we have 2 taxa which are preferred in the old database. The update includes a change which makes taxon 1 a synonym of taxon 2. So, we have 2 cache_taxon_paths entries to start with, but we try to update taxon 1's CTP taxon_meaning_id to point to taxon 2's taxon_meaning_id. As this taxon_meaning_id is already in the table, it fails. What we actually want to do in this case is delete the CTP record for taxon 1. Actually, it's more significant than this, because the path field is an array of taxon_meaning_ids so we really need to rebuild the cache_taxon_paths table for entries linked to the UKSI list.

I note that the PHP code associated with running these scripts does actually run the cache update scripts from the cache_builder module. So, I wonder if the easiest way forward is to replace the update statement above with a delete statement (just to remove any pointing to old taxon meanings), then hope that the cache update scripts do their thing and recreate any missing CTP records + update existing ones. We'd need to test this to make sure all taxon_meaning_ids used by the UKSI list do have a cache_taxon_paths entry.

Sorry - I knew these scripts are complex!

burkmarr commented 5 years ago

@johnvanbreda - thanks for the info. Taking a step back I decided to create a new warehouse on my computer where I can just test the scripts out.

I installed latest warehouse by cloning master (and submodules recursively), configured it and ran update. I created an empty taxon list for the USKI. I took the unmodified UKSI scripts from the Indicia-support project but when I run them against this new warehouse to populate the new list, script five falls over because it's looking for a taxon_associations table which isn't yet in my indicia schema. Did I miss something?

johnvanbreda commented 5 years ago

Yes, the scripts do assume that the taxon_associations module is installed on the warehouse by adding to config.php and visiting /index.php/home/upgrade to add the database tables.

burkmarr commented 5 years ago

Ah thanks. Also taxon_designation, species_alerts and a whole bunch of datacleaner* modules. We should probably update https://github.com/Indicia-Team/support_files/tree/master/UKSI/readme.md (referred to in the Indicia manual) to say that these modules should be enabled in order for the scripts to run.

johnvanbreda commented 5 years ago

Agreed - are you OK to do this?

burkmarr commented 5 years ago

Sure I can update the readme file. Is it okay for me to do that straight on GitHub?

Re your analysis a few posts back...

I had already noticed the deletions on tables that reference taxon_meaning_id and had added a similar operation for cache_taxon_paths but it didn't help, as you suspected. With respect to your proposed solution, do you mean replace this unsuccessful code:

UPDATE cache_taxon_paths ctp 
SET taxon_meaning_id=nc.new_taxon_meaning_id 
FROM uksi.preferred_name_changes nc 
WHERE nc.old_taxon_meaning_id=ctp.taxon_meaning_id 
AND ctp.taxon_meaning_id<>nc.new_taxon_meaning_id;

with this:

DELETE FROM cache_taxon_paths WHERE taxon_meaning_id IN (
  SELECT old_taxon_meaning_id FROM uksi.preferred_name_changes
) 

Have I understood that correctly?

Re the final check; would that just be something like this?

select count(*) from taxa_taxon_lists 
where taxon_list_id = 1 and taxon_meaning_id is null
johnvanbreda commented 5 years ago

Correct up to the final check and yes to updating on GitHub.

I think the final check is that the following returns 0:

select count(*)
from taxa_taxon_lists ttl
left join cache_taxon_paths ctp on ctp.taxon_meaning_id=ttl.taxon_meaning_id
where ctp.taxon_meaning_id is null
burkmarr commented 5 years ago

Yes I just spotted my mistake in that final check! I ran on my local warehouse updating from uksi distributed in support files to the most recent supplied by Chris. Scripts went through okay but the check query reports over 125000 in taxon_taxon_lists without corresponding record in cache_taxon_paths.

burkmarr commented 5 years ago

@johnvanbreda Here is some output generated by the import-uksi-helper.php:

Processing cache for taxa_taxon_lists
  - default (UPDATE) - OK (0.6s)
  - default (INSERT) - OK (3.3s)
  - setup (EXTRAS) - OK (11.3s)
  - Taxon paths (EXTRAS) - OK (4s)
  - Ranks (EXTRAS) - OK (2.7s)
  - teardown (EXTRAS) - OK (0.1s)

So the 'Taxon paths' update in cache_builder was called okay and clearly did some work. I'm looking at $config['taxa_taxon_lists']['extra_multi_record_updates'] from cahce_builder.php but it is very complicated and I'm struggling to see the wood from the trees.

johnvanbreda commented 5 years ago

If you are running the scripts via the PHP command line then the import_uksi_helper.php file will indeed run the extra_multi_record_updates query you've correctly identified. The problem is likely to stem from the fact that this query is modified to join to uksi.changed_taxa_taxon_list_ids so that only changed IDs get updated. Presumably at this point the contents of this table don't cover all the records whose cache_taxon_paths entries were deleted by the delete query. I wonder if the reason for this is not actually changes, but the delete logic is overenthusiastic? Here's a suggested alternative (though it needs checking for performance and might be better as a LEFT JOIN):

DELETE FROM cache_taxon_paths WHERE taxon_meaning_id IN (
  SELECT old_taxon_meaning_id FROM uksi.preferred_name_changes
) AND taxon_meaning_id NOT IN (
  SELECT new_taxon_meaning_id FROM uksi.preferred_name_changes
)
burkmarr commented 5 years ago

@johnvanbreda - same result I'm afraid. However - I've just run the check query on a clean UKSI (from current distribution in support files) and the query returns 248,000 rows, so perhaps the query isn't working the way you expected?

select count(*)
from taxa_taxon_lists ttl
left join cache_taxon_paths ctp on ctp.taxon_meaning_id=ttl.taxon_meaning_id
where ctp.taxon_meaning_id is null
burkmarr commented 5 years ago

Okay that's because the cache_taxon_paths was empty when I ran the query so I probably need to build the cache?

burkmarr commented 5 years ago

I do get zero on original DB after forcing a cache rebuild.

burkmarr commented 5 years ago

@johnvanbreda The check query resulted in 518 now after these steps:

  1. Start with a clean USKI (as distributed in support files)
  2. Force complete cache rebuild
  3. Run update scripts with modified delete query and latest data from Chris
burkmarr commented 5 years ago

@johnvanbreda - I think I found the problem. I listed those 518 records from taxa_taxon_lists and they all had their deleted flag set. So I guess that the check query should be:

select count(*)
from taxa_taxon_lists ttl
left join cache_taxon_paths ctp on ctp.taxon_meaning_id=ttl.taxon_meaning_id
where ctp.taxon_meaning_id is null and ttl.deleted = false

Which does produce a result of 0.

I also found that the result was the same regardless of the two delete queries I used.

Are you happy for me to proceed on to running the modified scripts on the dev server with the new UKSI from Chris?

johnvanbreda commented 5 years ago

@burkmarr Ok - good, filtering out deleted does make sense. Yes, I'm happy for you to run this on the dev server - probably just worth making sure that there was a backup last night or thereabouts, as I know sometimes Andy has configuration in development on that server he wouldn't want to lose.

burkmarr commented 5 years ago

@BirenRathod - see above comment from John. Is there a recent dev server backup? If so I will go ahead and try the update UKSI today if you are around to help with the restore if something should go wrong.

burkmarr commented 5 years ago

@BirenRathod - the USKI update guidance (https://github.com/Indicia-Team/support_files/tree/master/UKSI) also says:

Before running the script, please ensure that your warehouse scheduled tasks have been run and then stopped to ensure everything is up to date.

Are you okay to do that? If you can't do this today, lets get a date in the diary. I am away next week so perhaps we could arrange something for the week after.

burkmarr commented 5 years ago

I've emailed Chris Raper to see if there are any significant updates to UKSI since he last supplied a couple of weeks ago. Hopefully once I hear from Chris I'll be ready to schedule a date for the update on the Dev Warehouse this week.

As well as making sure the database is backed up, the documentation (https://github.com/Indicia-Team/support_files/tree/master/UKSI) says:

Before running the script, please ensure that your warehouse scheduled tasks have been run and then stopped to ensure everything is up to date.

Are you able to take care of that once we schedule a date @BirenRathod?

@johnvanbreda - I remember you saying that when we have a date scheduled for this we should touch base with a couple of people - I think that Gary was one. Can you remind me who and how best to contact them?

I think we should treat this as a dry run for the update to the Live Warehouse, so I want to be sure that we do the same things that we'd do with the live Warehouse. Apart from running all scheduled tasks, is there anything else we need to do? For example do we need to put any of the major websites (e.g. iRecord) into maintenance mode while we do this?

On the live site, should we attempt this at any particular time of day (or night) to minimise disruption?

johnvanbreda commented 5 years ago

Yes, it's better if iRecord is in maintenance mode. In theory it should stay functional while the updates are being applied but that I would rather reduce the risk. I'll send you Gary's email.

BirenRathod commented 5 years ago

@burkmarr I have scheduled the warehouse backup and you can find backup file here -> D:\Indicia_backup\scheduled. Also, I assure you that schedule_task is running fine and it runs on every 15 mins.

Yes, I will look after the scheduled_task once date will be decided.

burkmarr commented 5 years ago

Thanks @BirenRathod.

burkmarr commented 5 years ago

@johnvanbreda UKSI on Dev Warehouse failed in script 35 with this message:

update or delete on table "taxon_meanings" violates foreign key constraint 
"fk_taxon_meaning_association_from_taxon_meaning" on table "taxon_associations"

It must be from this SQL in the script since it's the only one that deals with taxon_meanings:

DELETE FROM taxon_meanings WHERE id IN (
  SELECT tm.id FROM taxon_meanings tm
  LEFT JOIN taxa_taxon_lists ttl ON ttl.taxon_meaning_id=tm.id
  WHERE ttl.id IS NULL
);

Earlier in script 35, there are these lines to ensure that taxon_associations are updated where taxon meanings have changed.

UPDATE taxon_associations ta
SET from_taxon_meaning_id=nc.new_taxon_meaning_id
FROM uksi.preferred_name_changes nc
WHERE nc.old_taxon_meaning_id=ta.from_taxon_meaning_id
AND ta.from_taxon_meaning_id<>nc.new_taxon_meaning_id;

UPDATE taxon_associations ta
SET to_taxon_meaning_id=nc.new_taxon_meaning_id
FROM uksi.preferred_name_changes nc
WHERE nc.old_taxon_meaning_id=ta.to_taxon_meaning_id
AND ta.to_taxon_meaning_id<>nc.new_taxon_meaning_id;

But there's no corresponding delete query to remove taxon association records where taxon meanings have gone. Do we need to add these?

DELETE FROM taxon_associations WHERE to_taxon_meaning_id IN (
  SELECT tm.id FROM taxon_meanings tm
  LEFT JOIN taxa_taxon_lists ttl ON ttl.taxon_meaning_id=tm.id
  WHERE ttl.id IS NULL
);
DELETE FROM taxon_associations WHERE from_taxon_meaning_id IN (
  SELECT tm.id FROM taxon_meanings tm
  LEFT JOIN taxa_taxon_lists ttl ON ttl.taxon_meaning_id=tm.id
  WHERE ttl.id IS NULL
);
burkmarr commented 5 years ago

I tried it with this correction to script 35. Accidentally re-started from script 1 again and it went down at script 5 with another foreign key violation. But given that it got to 35 before, I started again from 35 and it completed. The results are not as expected though, so I would like to tray again from the begining with the updated script. I have asked @BirenRathod to restore the DB so I can have another go.

burkmarr commented 5 years ago

I got the update through on dev by running it again with a modification to the import scripts. At the end of the previous update, the main taxa tables appeared to be up to date, but the cache taxa tables were not. Running the unmodified scripts again did not update the cache tables because the scripts compare the imported data with the main taxa tables to judge if there are changes that require changes to the cache tables. Since the main taxa tables, by this point, already had the changes and new records from the import, then no cache changes were triggered.

I modified a script so that all taxa in the import would be considered as changed and this took care of the cache update.

@BirenRathod - I still want to rerun the import, with the modified scripts, against a restored database on dev to make sure it goes through without problems before running on the live version, so let me know when this is done and I will try again.

Below are changes noted by Chris Raper that I checked from iRecord dev and found they were reflected correctly.

BirenRathod commented 5 years ago

@burkmarr Just to let you know that database restore process is running at present. I will let you know as soon as it finished.

burkmarr commented 5 years ago

Thanks Biren that’s excellent.

burkmarr commented 5 years ago

The fresh update to update the UKSI on the dev warehouse succeeded (without a forced cache rebuild) and the changes flagged by Chris are visible on iRecord dev.

Output from the update scripts is shown below. The interruption at script 5 was because I forgot to provide read permission for Postgres on the data - it won't have affected the update.

Subject to some checking on dev, we now need to schedule the update on the live warehouse.

Database connection OK
Superuser database connection OK
1 - Prepare for script access to settings - OK (0.1s)
2 - Create tables for tracking changes to existing records - OK (0.1s)
3 - Find any duplicated input_taxon_version_keys in the existing warehouse data - OK (10.8s)
    - Count of duplicated taxon version keys in existing warehouse data: 324
4 - Tidy any duplicated taxon version keys in existing warehouse data - OK (50.1s)
5 - Remove any orphaned taxa records in the existing warehouse data - OK (23.4s)
    - Existing orphaned taxa records removed: 6
6 - Create interim tables for UKSI data - OK (0.6s)
7 - Import the CSV data into the database
Error in script: ERROR:  could not open file "C:\Users\rburkmar\Documents\UKSI\preferred_names.txt" for reading: Permission denied
HINT:  COPY FROM instructs the PostgreSQL server process to read a file. You may want a client-side facility such as psql's \copy.

Database connection OK
Superuser database connection OK
1 - Prepare for script access to settings - OK (0s)
2 - Create tables for tracking changes to existing records - OK (0s)
3 - Find any duplicated input_taxon_version_keys in the existing warehouse data - OK (8.7s)
    - Count of duplicated taxon version keys in existing warehouse data: 324
4 - Tidy any duplicated taxon version keys in existing warehouse data - OK (1.4s)
5 - Remove any orphaned taxa records in the existing warehouse data - OK (19.8s)
    - Existing orphaned taxa records removed: 0
6 - Create interim tables for UKSI data - OK (0.1s)
7 - Import the CSV data into the database - OK (22.1s)
    - type || count
    - all names || 274655
    - preferred names || 109404
8 - Remove synonyms that are not wellformed if similar to others - OK (4.8s)
    - Similar not wellformed synonyms removed: 20757
9 - Remove synonyms that differ from the preferred name only in rank - OK (4.4s)
    - Synonyms removed which differ only in rank: 13
10 - Use organism_master hierarchy to find incorrect parent taxon version keys - OK (1.9s)
    - Parent keys updated: 7
11 - Update the existing warehouse taxon ranks to match UKSI data - OK (0s)
    - Taxon ranks updated: 2
12 - Cleanup after taxon ranks update - OK (2.2s)
13 - Insert new taxon ranks - OK (0s)
    - Taxon ranks inserted: 1
14 - Insert new taxon groups - OK (0.2s)
    - Taxon groups inserted: 2
15 - Update the existing taxon groups to match UKSI data - OK (0s)
    - Taxon groups update: 0
16 - Cleanup after taxon groups update - OK (0s)
17 - Remove any old names which are not used in warehouse and now deleted in UKSI - OK (77s)
    - Unused names removed: 720
18 - Flag as not for data entry any old names which are used in warehouse but now deleted in UKSI - OK (3.9s)
    - Old names flagged as not for data entry: 119
19 - Prepare the taxa table UKSI version - OK (6.3s)
    - Number of records in taxa: 253885
20 - Link UKSI taxa to existing warehouse taxa - OK (13.2s)
    - Number of pre-existing taxa found: 27184
21 - Update values for existing taxa which have changed - OK (10.5s)
    - Number of pre-existing taxa updated: 25821
22 - Insert new taxon records - OK (1.3s)
    - Number of new taxon records: 5730
23 - Prepare the taxa taxon lists table UKSI version - OK (5.3s)
    - type || count
    - child list taxa_taxon_lists || 14762
    - total taxa_taxon_lists || 268647
24 - Link UKSI taxa to existing warehouse taxa taxon lists - OK (12.4s)
    - ?column? || count
    - existing taxa_taxon_lists linked || 262786
25 - Create new taxon meanings and populate them into the preferred names - OK (1.2s)
    - type || count
    - meanings attached to preferred names || 111125
    - new meanings || 6406
26 - Populate taxon meanings into the synonyms - OK (9.1s)
    - Meanings attached to synonyms: 157523
27 - Fill in the parent to child links - OK (11.6s)
    - type || count
    - changes or new parent links || 22454
    - children linked to parents || 253995
28 - Prepare common name mappings - OK (2.2s)
    - Names mapped to common names: 77773
29 - Apply common name mappings - OK (3.7s)
    - type || count
    - Names which had a common name change || 2438
30 - Insert new taxa taxon list records - OK (5.4s)
    - Number of new taxa taxon list record: 5861
31 - Update values for existing taxa taxon lists which have changed - OK (6.2s)
    - Number of pre-existing taxa taxon lists updated: 11512
32 - Ensure taxa taxon list attribute values point to the preferred name - OK (4.9s)
    - Number of attribute values relinked to preferred names: 13243
33 - Update the taxon designations data - OK (30.3s)
34 - Correct preferred name for expired names - OK (2.4s)
35 - Tidy up orphaned records - OK (1894.1s)
36 - Ensure updated common and preferred names are applied to all names in concept - OK (1.7s)
Processing cache for termlists_terms
  - default (UPDATE) - OK (0.1s)
  - default (INSERT) - OK (0s)
Processing cache for taxa_taxon_lists
  - default (UPDATE) - OK (21.7s)
  - default (INSERT) - OK (8.2s)
  - setup (EXTRAS) - OK (11.9s)
  - Taxon paths (EXTRAS) - OK (5.3s)
  - Ranks (EXTRAS) - OK (1169.9s)
  - teardown (EXTRAS) - OK (0s)
Processing cache for taxon_searchterms
  - standard terms (UPDATE) - OK (88.8s)
  - abbreviations (UPDATE) - OK (20.6s)
  - simplified terms (UPDATE) - OK (24.7s)
  - codes (UPDATE) - OK (0.1s)
  - id_diff (UPDATE) - OK (1.1s)
  - standard terms (INSERT) - OK (7.4s)
  - abbreviations (INSERT) - OK (3.5s)
  - simplified terms (INSERT) - OK (4.5s)
  - codes (INSERT) - OK (0.2s)
  - id_diff (INSERT) - OK (0.5s)
Processing cache for occurrences
  - functional (UPDATE) - OK (0.9s)
  - functional_sensitive (UPDATE) - OK (0.5s)
  - nonfunctional (UPDATE) - OK (4s)
  - nonfunctional_media (UPDATE) - OK (0s)
  - nonfunctional_data_cleaner_info (UPDATE) - OK (0s)
  - nonfunctional_sensitive (UPDATE) - OK (0s)
  - functional (INSERT) - OK (0s)
  - functional_sensitive (INSERT) - OK (0s)
  - nonfunctional (INSERT) - OK (0s)
  - nonfunctional_attrs (INSERT) - OK (0.1s)
  - nonfunctional_media (INSERT) - OK (0s)
  - nonfunctional_data_cleaner_info (INSERT) - OK (0s)
  - nonfunctional_sensitive (INSERT) - OK (0s)
Total time: 3940.7s
burkmarr commented 5 years ago

Updated the live warehouse UKSI whilst iRecord server being moved and it went through without problems. Script output is shown below. Figures are similar to update on dev, as you'd expect, but the time taken for the update was greater - about 2 hrs and 40 mins in total. Need to check key taxa changes in iRecord when it's back up.

Database connection OK
Superuser database connection OK
1 - Prepare for script access to settings - OK (0.1s)
2 - Create tables for tracking changes to existing records - OK (0.1s)
3 - Find any duplicated input_taxon_version_keys in the existing warehouse data - OK (3.2s)
    - Count of duplicated taxon version keys in existing warehouse data: 324
4 - Tidy any duplicated taxon version keys in existing warehouse data - OK (36.3s)
5 - Remove any orphaned taxa records in the existing warehouse data - OK (6.8s)
    - Existing orphaned taxa records removed: 6
6 - Create interim tables for UKSI data - OK (0.6s)
7 - Import the CSV data into the database - OK (15.6s)
    - type || count
    - all names || 274655
    - preferred names || 109404
8 - Remove synonyms that are not wellformed if similar to others - OK (5.6s)
    - Similar not wellformed synonyms removed: 20757
9 - Remove synonyms that differ from the preferred name only in rank - OK (5.6s)
    - Synonyms removed which differ only in rank: 13
10 - Use organism_master hierarchy to find incorrect parent taxon version keys - OK (1.3s)
    - Parent keys updated: 7
11 - Update the existing warehouse taxon ranks to match UKSI data - OK (0.1s)
    - Taxon ranks updated: 2
12 - Cleanup after taxon ranks update - OK (1.7s)
13 - Insert new taxon ranks - OK (0s)
    - Taxon ranks inserted: 1
14 - Insert new taxon groups - OK (0.2s)
    - Taxon groups inserted: 2
15 - Update the existing taxon groups to match UKSI data - OK (0s)
    - Taxon groups update: 0
16 - Cleanup after taxon groups update - OK (0s)
17 - Remove any old names which are not used in warehouse and now deleted in UKSI - OK (23s)
    - Unused names removed: 716
18 - Flag as not for data entry any old names which are used in warehouse but now deleted in UKSI - OK (3.7s)
    - Old names flagged as not for data entry: 123
19 - Prepare the taxa table UKSI version - OK (8.7s)
    - Number of records in taxa: 253885
20 - Link UKSI taxa to existing warehouse taxa - OK (4.3s)
    - Number of pre-existing taxa found: 27182
21 - Update values for existing taxa which have changed - OK (6s)
    - Number of pre-existing taxa updated: 25820
22 - Insert new taxon records - OK (1.1s)
    - Number of new taxon records: 5723
23 - Prepare the taxa taxon lists table UKSI version - OK (3.7s)
    - type || count
    - child list taxa_taxon_lists || 14762
    - total taxa_taxon_lists || 268647
24 - Link UKSI taxa to existing warehouse taxa taxon lists - OK (4.6s)
    - ?column? || count
    - existing taxa_taxon_lists linked || 262793
25 - Create new taxon meanings and populate them into the preferred names - OK (0.6s)
    - type || count
    - meanings attached to preferred names || 111127
    - new meanings || 6397
26 - Populate taxon meanings into the synonyms - OK (3.7s)
    - Meanings attached to synonyms: 157523
27 - Fill in the parent to child links - OK (3.1s)
    - type || count
    - changes or new parent links || 22441
    - children linked to parents || 253995
28 - Prepare common name mappings - OK (5s)
    - Names mapped to common names: 77773
29 - Apply common name mappings - OK (1.1s)
    - type || count
    - Names which had a common name change || 2433
30 - Insert new taxa taxon list records - OK (5.9s)
    - Number of new taxa taxon list record: 5854
31 - Update values for existing taxa taxon lists which have changed - OK (3.9s)
    - Number of pre-existing taxa taxon lists updated: 11481
32 - Ensure taxa taxon list attribute values point to the preferred name - OK (2.3s)
    - Number of attribute values relinked to preferred names: 13265
33 - Update the taxon designations data - OK (15.4s)
34 - Correct preferred name for expired names - OK (0.8s)
35 - Tidy up orphaned records - OK (3623.7s)
36 - Ensure updated common and preferred names are applied to all names in concept - OK (1.1s)
Processing cache for termlists_terms
  - default (UPDATE) - OK (0.1s)
  - default (INSERT) - OK (0s)
Processing cache for taxa_taxon_lists
  - default (UPDATE) - OK (13.3s)
  - default (INSERT) - OK (3.1s)
  - setup (EXTRAS) - OK (9.5s)
  - Taxon paths (EXTRAS) - OK (3.5s)
  - Ranks (EXTRAS) - OK (3937.8s)
  - teardown (EXTRAS) - OK (0s)
Processing cache for taxon_searchterms
  - standard terms (UPDATE) - OK (135.8s)
  - abbreviations (UPDATE) - OK (23.4s)
  - simplified terms (UPDATE) - OK (18s)
  - codes (UPDATE) - OK (0.2s)
  - id_diff (UPDATE) - OK (1.7s)
  - standard terms (INSERT) - OK (9s)
  - abbreviations (INSERT) - OK (2.3s)
  - simplified terms (INSERT) - OK (2.6s)
  - codes (INSERT) - OK (0.2s)
  - id_diff (INSERT) - OK (0.5s)
Processing cache for occurrences
  - functional (UPDATE) - OK (45s)
  - functional_sensitive (UPDATE) - OK (1.9s)
  - nonfunctional (UPDATE) - OK (29.4s)
  - nonfunctional_media (UPDATE) - OK (0.6s)
  - nonfunctional_data_cleaner_info (UPDATE) - OK (0.3s)
  - nonfunctional_sensitive (UPDATE) - OK (0.1s)
  - functional (INSERT) - OK (0.1s)
  - functional_sensitive (INSERT) - OK (0.1s)
  - nonfunctional (INSERT) - OK (0s)
  - nonfunctional_attrs (INSERT) - OK (1.6s)
  - nonfunctional_media (INSERT) - OK (0.9s)
  - nonfunctional_data_cleaner_info (INSERT) - OK (1.1s)
  - nonfunctional_sensitive (INSERT) - OK (0s)
Total time: 9510.3s
burkmarr commented 5 years ago

The new taxa are available through iRecord, so closing issue.