IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
878 stars 489 forks source link

Fix migration issues for custom fields w/ controlled vocabulary #1777

Closed posixeleni closed 9 years ago

posixeleni commented 9 years ago

Uncovered errors during a recent migration test by @ekraffmiller that I will need to fix in the following custom metadata blocks:

GSD Block

  1. Fix missing values for gsdCoordinator and gsdFacultyName
Import Exception processing file GSD_Test_2/92036.xml, msg:Error parsing datasetVersion: Value 'Idenburg, Florian' does not exist in type 'gsdCoordinator'
Import Exception processing file GSD_Test_2/91588-1.xml, msg:Error parsing datasetVersion: Value 'Kiefer, Matthew'
Import Exception processing file gsdplatform/117891-1.xml, msg:Error parsing datasetVersion: Value 'Other' does not exist in type 'gsdFacultyName'
Import Exception processing file gsdplatform/118351-1.xml, msg:Error parsing datasetVersion: Value 'Lee, Chris' does not exist in type 'gsdFacultyName'
Import Exception processing file gsdplatform/118161-1.xml, msg:Error parsing datasetVersion: Value 'Scogin, Mack' does not exist in type 'gsdFacultyName'
parsing datasetVersion: Value 'Barkan, Katy' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/92987.xml, msg:Error parsing datasetVersion: Value 'Hyde, Timothy' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/93113.xml, msg:Error parsing datasetVersion: Value 'Desimini, Jill' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/93074.xml, msg:Error parsing datasetVersion: Value 'Najle, Ciro' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/93329-1.xml, msg:Error parsing datasetVersion: Value 'Ozay, Erkin' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/93090.xml, msg:Error parsing datasetVersion: Value 'Whittaker, Elizabeth' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/93263.xml, msg:Error parsing datasetVersion: Value 'Sarkis, A. Hashim' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/93263.xml, msg:Error parsing datasetVersion: Value 'Sarkis, A. Hashim' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/93370.xml, msg:Error parsing datasetVersion: Value 'Madden, Kathryn' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/91846.xml, msg:Error parsing datasetVersion: Value 'Howler, Eric' does not exist in type 'gsdCoordinator'
Import Exception processing file gsd/92420-1.xml, msg:Error parsing datasetVersion: Value 'Doran, Kelly' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/92901-1.xml, msg:Error parsing datasetVersion: Value 'Rich, Damon' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/93058.xml, msg:Error parsing datasetVersion: Value 'Buchard, Jeffry' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/93170-1.xml, msg:Error parsing datasetVersion: Value 'Rocker, Ingeborg' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/92921.xml, msg:Error parsing datasetVersion: Value 'Legendre, George L.' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/93351.xml, msg:Error parsing datasetVersion: Value 'Sentkiewicz, Renata' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/93379-1.xml, msg:Error parsing datasetVersion: Value 'Desimini, Jill' does not exist in type 'gsdCoordinator'
Import Exception processing file gsd/92211.xml, msg:Error parsing datasetVersion: Value 'Koolhaas, Remment' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/93310-1.xml, msg:Error parsing datasetVersion: Value 'Gillies-Smith, Shauna' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/93289.xml, msg:Error parsing datasetVersion: Value 'Lehrer, Mia' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/93147.xml, msg:Error parsing datasetVersion: Value 'MCloskey, Karen' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/93276.xml, msg:Error parsing datasetVersion: Value 'Desvigne, Michel' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/92983.xml, msg:Error parsing datasetVersion: Value 'Schumacher, Patrik' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/92417.xml, msg:Error parsing datasetVersion: Value 'Coignet, Philippe' does not exist in type 'gsdFacultyName'
Import Exception processing file GSD_Test_2/91588-1.xml, msg:Error parsing datasetVersion: Value 'Etzler, Danielle' does not exist in type 'gsdCoordinator'

PSRI Block

  1. Fix missing NA value in PSRI8 and PSRI2
Import Exception processing file alexanderfouirnaies/122040-1.xml, msg:Error parsing datasetVersion: Value 'NA' does not exist in type 'PSRI2'

ARCS Block

  1. Trim whitespace at the end of No
Import Exception processing file arcs/91035-1.xml, msg:Error parsing datasetVersion: Value 'No' does not exist in type 'ARCS4'
posixeleni commented 9 years ago

@ekraffmiller I have put in the fixes for the issues found in the error log you submitted to me. This wont need a new schema.xml but a db drop would be needed to see the new values in build for @kcondon to test.

I did encounter two types of errors that I could not immediately resolve with the tsv and need you and @scolapasta advice:

  1. There is one error I cannot fix using the tsv: but it can be solved by splitting this string into two separate Language values (need to do this manually in the DB?):
Import Exception processing file worldhistorical/122057.xml, msg:Error parsing datasetVersion: Value 'English and Dutch' does not exist in type 'language'
  1. There is also another error that I am not sure immediately how to resolve where folks use to have a custom field allow for free-text but then changed it to only allow for a strict Yes No NA controlled vocabulary. How do we preserve these legacy values without altering the controlled vocabulary list?
Import Exception processing file stwalter/117416-1.xml, msg:Error parsing datasetVersion: Value 'http://www.europeansocialsurvey.org' does not exist in type 'PSRI8'
Import Exception processing file PSReplication/122026-1.xml, msg:Error parsing datasetVersion: Value 'www.aiddata.org' does not exist in type 'PSRI8'
Import Exception processing file PSReplication/92777-1.xml, msg:Error parsing datasetVersion: Value 'www.isaidno.de' does not exist in type 'PSRI8'
Import Exception processing file SOCRATESJOURNAL/117782.xml, msg:Error parsing datasetVersion: Value 'http://www.socratesjournal.com/index.php/socrates/article/view/5' does not exist in type 'PSRI8'
Import Exception processing file SOCRATESJOURNAL/117782-1.xml, msg:Error parsing datasetVersion: Value 'http://www.socratesjournal.com' does not exist in type 'PSRI8'
Import Exception processing file SOCRATESJOURNAL/117782-2.xml, msg:Error parsing datasetVersion: Value 'http://www.socratesjournal.com/index.php/socrates/article/view/5' does not exist in type 'PSRI8'
scolapasta commented 9 years ago

For the English and Dutch one, we added this to the pre scrub: update studyfieldvalue set strvalue='English' where metadata_id=273999 and studyfield_id=218 and strValue='English and Dutch'; insert into studyfieldvalue (strvalue, metadata_id, studyfield_id, displayorder) values ('Dutch', 273999,218,1);

scolapasta commented 9 years ago

@posixeleni

For the 2nd question; not sure.

They clearly can't be part of the controlled vocab. Is there some other existing field that these can be parsed to? If so, we can do a prescrub where we set that value and then delete these?

posixeleni commented 9 years ago

@scolapasta for me to better answer that question is there any way you or @ekraffmiller can give me the dataset IDs for these problematic datasets in Q#2? Once I go in to the actual datasets to see what they are doing it might make it easier to resolve this issue.

scolapasta commented 9 years ago

The db id is in the error as the number (before the dash) of the xml file. You can use that id and the version number (which is the number after the dash) on the study page in production, like so:

http://thedata.harvard.edu/dvn/faces/study/StudyPage.xhtml?studyId=92777&versionNumber=1

This one for example is a draft. (I assume they all might be)

posixeleni commented 9 years ago

@scolapasta @ekraffmiller after looking at the datasets it became evident that they renamed/renumbered this field in 3.6 so all of the PSRI8 ones that have an error above should be mapped to PSRI3 which already exists in the custom metadata block and is a free-text field. Is this possible?

image

posixeleni commented 9 years ago

@ekraffmiller @scolapasta Sorry about the confusion this week over this one issue with PSRI8. I have gone in and made sure all the fields mapped to the correct ones (especially the YES/NO/NA). Here is the updated spreadsheet with the correct mapping: https://docs.google.com/spreadsheets/d/1rZo3QugzmYifpo518QgvvpIOb14Z8IM-5dWumzG5FME/edit?usp=sharing

Please let me know if you need me to help with anything else.

posixeleni commented 9 years ago

@ekraffmiller I updated the csv file and checked it into github so let me know if I can help with anything else. Fingers crossed the error log looks MUCH better this time.

posixeleni commented 9 years ago

Ready for QA to test with a re-migration to see if any errors come up.

posixeleni commented 9 years ago

Had to add one more gsdCoordinatorName to the customGSD.tsgv block

Import Exception processing file GSD_Test_2/91588-1.xml, msg:Error parsing datasetVersion: Value 'Etzler, Danielle' does not exist in type 'gsdCoordinator'
posixeleni commented 9 years ago

Fixed more missing controlled vocabulary values added for customGSD.tsv

'Abalos, Inaki' does not exist in type 'gsdCoordinator'
Import Exception processing file gsd/93054-1.xml, msg:Error parsing datasetVersion: Value 'Other' does not exist in type 'gsdCoordinator'
Import Exception processing file gsd/92203.xml, msg:Error parsing datasetVersion: Value 'Other' does not exist in type 'gsdCoordinator'
Import Exception processing file gsd/93053-1.xml, msg:Error parsing datasetVersion: Value 'Other' does not exist in type 'gsdCoordinator'
Import Exception processing file gsd/91635-1.xml, msg:Error parsing datasetVersion: Value 'Long, Judith' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/93049-1.xml, msg:Error parsing datasetVersion: Value 'Other' does not exist in type 'gsdCoordinator'
Import Exception processing file gsd/93056-1.xml, msg:Error parsing datasetVersion: Value 'Other' does not exist in type 'gsdCoordinator'
Import Exception processing file gsd/92212.xml, msg:Error parsing datasetVersion: Value 'Other' does not exist in type 'gsdCoordinator'
Import Exception processing file gsd/92212-1.xml, msg:Error parsing datasetVersion: Value 'Other' does not exist in type 'gsdCoordinator'
Import Exception processing file gsd/92203-1.xml, msg:Error parsing datasetVersion: Value 'Other' does not exist in type 'gsdCoordinator'
Import Exception processing file gsd/93289.xml, msg:Error parsing datasetVersion: Value 'Maltzan, Michael' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/93050-1.xml, msg:Error parsing datasetVersion: Value 'Other' does not exist in type 'gsdCoordinator'
Import Exception processing file gsd/93271.xml, msg:Error parsing datasetVersion: Value 'Maltzan, Michael' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/93052-1.xml, msg:Error parsing datasetVersion: Value 'Other' does not exist in type 'gsdCoordinator'
Import Exception processing file gsd/91640.xml, msg:Error parsing datasetVersion: Value 'Bandy, Vincent' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/93147.xml, msg:Error parsing datasetVersion: Value 'VanDerSys, Keith' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/93148.xml, msg:Error parsing datasetVersion: Value 'VanDerSys, Keith' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/93276.xml, msg:Error parsing datasetVersion: Value 'Hansch, Inessa' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/93260.xml, msg:Error parsing datasetVersion: Value 'Hansch, Inessa' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/93277.xml, msg:Error parsing datasetVersion: Value 'Hansch, Inessa' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/93256.xml, msg:Error parsing datasetVersion: Value 'Maltzan, Michael' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/91622-1.xml, msg:Error parsing datasetVersion: Value 'Curtis, Lawrence' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/93041-1.xml, msg:Error parsing datasetVersion: Value 'Other' does not exist in type 'gsdCoordinator'
Import Exception processing file gsd/93051-1.xml, msg:Error parsing datasetVersion: Value 'Other' does not exist in type 'gsdCoordinator'
Import Exception processing file gsd/93055-1.xml, msg:Error parsing datasetVersion: Value 'Other' does not exist in type 'gsdCoordinator'
Import Exception processing file gsd/93149-1.xml, msg:Error parsing datasetVersion: Value 'VanDerSys, Keith' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/91645-1.xml, msg:Error parsing datasetVersion: Value '01402: Parallel Motion: Walden Pond, Concord / Central Park, New York' does not exist in type 'gsdCourseName'
Import Exception processing file gsd/93255.xml, msg:Error parsing datasetVersion: Value 'Hansch, Inessa' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/91643-1.xml, msg:Error parsing datasetVersion: Value '01403: After La Villette' does not exist in type 'gsdCourseName'
Import Exception processing file gsd/91699-1.xml, msg:Error parsing datasetVersion: Value '01404: California Limnolarium (experiments in projective processes)' does not exist in type 'gsdCourseName'
Import Exception processing file gsd/93225.xml, msg:Error parsing datasetVersion: Value 'Maltzan, Michael' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/91732-1.xml, msg:Error parsing datasetVersion: Value 'O'Donnell, Sheila' does not exist in type 'gsdFacultyName'
Import Exception processing file gsd/91694-1.xml, msg:Error parsing datasetVersion: Value 'Wu, Cameron' does not exist in type 'gsdCoordinator'
Import Exception processing file gsd/93146.xml, msg:Error parsing datasetVersion: Value 'VanDerSys, Keith' does not exist in type 'gsdFacultyName'
posixeleni commented 9 years ago

Added another missing value to the customGSD.tsv file based on @ekraffmiller error log.

Import Exception processing file gsd/91732-1.xml, msg:Error parsing datasetVersion: Value 'Tuomey, John' does not exist in type ‘gsdFacultyName'
posixeleni commented 9 years ago

@kcondon to test this in build I would need a db drop as well

posixeleni commented 9 years ago

No further errors have been reported. I am closing this ticket.