collectionspace / cspace-converter

Migrate data to CollectionSpace.
0 stars 7 forks source link

Converter profile for AuthPerson_Core_All #87

Open mark-cooper opened 5 years ago

mark-cooper commented 5 years ago

Branch: DDD-authperson-core

Mapping

https://github.com/lyrasis/cspace-converter/blob/master/lib/collectionspace/converter/core/person.rb

Data

https://github.com/lyrasis/cspace-converter/blob/master/data/core/authperson_core_all.csv

Config: use "core"

https://github.com/lyrasis/cspace-converter/blob/master/DEV.md#core

Start by manually creating a record in cspace using first row of sample data. Export the XML then delete the record from cspace. Make a PR including the reference XML (spec/fixtures/files/) and initial mapping spec (spec/collectionspace/converter/core/). From the converter you can use: bundle exec rake remote:get[acquisitions/$csid] to get the XML.

Implement fields not covered in mapping currently.

Paulmulonzia commented 5 years ago

Hi @mark-cooper,

Which name do I use to fetch/export the xml from cspace? I tried bundle exec rake remote:get[person/ab547505-b40f-4540-8187], bundle exec rake remote:get[persons/ab547505-b40f-4540-8187], bundle exec rake remote:get[person/local/ab547505-b40f-4540-8187] all of which didn't return anything.

mark-cooper commented 5 years ago

@Paulmulonzia here are some examples for authorities:

./bin/rake remote:get[personauthorities]
./bin/rake remote:get[personauthorities/73fd59db-9b1d-4d8e-be49/items]
# this is the same as above only using shortid
./bin/rake remote:get["personauthorities/urn:cspace:name(person)/items"]
./bin/rake remote:get["personauthorities/urn:cspace:name(person)/items/591a544a-e6b6-4102-be11"]
Paulmulonzia commented 4 years ago

Hi @mark-cooper, I have pushed the auth_person branch with the updated mappings.

Issues at hand:

termType contains data that's not available on the list items on cspace.

birthDateGroup and deathDateGroup contains variable data e.g. 1813, ca. 1928, ca. 420 BCE,

title - values had to be updated to not have succeeding periods. e.g. from Dr. to Dr

salutation and gender - values in sample data had to be modified to be lower case for the import to work.

The contact information fields are currently not importing to cspace despite the structure being similar to the exported reference xml.

Thanks.

mark-cooper commented 4 years ago

@Paulmulonzia thx! I'll look it over soon.

"The contact information fields are currently not importing to cspace despite the structure being similar to the exported reference xml."

I think this is the first time you're encountering multiple record types within a single converter. I'll update your branch to account for this so you can see how it works.

mark-cooper commented 4 years ago

@Paulmulonzia I've updated some things in your branch. Be sure to git pull origin DDD-authperson-core before doing any more work!

Here are the main things:

We now pass in some additional data to the converters:

https://github.com/collectionspace/cspace-converter/pull/104/files#diff-13469a5fd785089ba78311c372df955aR32-R35

The primary use is for authority record mapping. With procedures there is an explicit ID -> Value in the csv (i.e. objectNumber -> 123). But for authorities we don't have an explicit ID, just a name, so the ID has to be generated. The config allows us to pass along the generated ID so we can ensure consistency, which is especially important for cached items.

https://github.com/collectionspace/cspace-converter/blob/34b2e4e98cefc8817dc4dd2d7b1af4481aa8cb86/lib/collectionspace/converter/core/person.rb#L14 https://github.com/collectionspace/cspace-converter/blob/34b2e4e98cefc8817dc4dd2d7b1af4481aa8cb86/lib/collectionspace/converter/core/person.rb#L29

Generally speaking this should only be needed for authority records.

The next big thing is we need to create two record types for Person authority records: the person record and the contact record. To do this we need the wider XML document context:

https://github.com/collectionspace/cspace-converter/blob/34b2e4e98cefc8817dc4dd2d7b1af4481aa8cb86/lib/collectionspace/converter/core/person.rb#L7-L24

This allows us to define a separate class method for the contact section (that can be re-used for other classes in the future -- in fact I'll move contact out to its own section after person is complete).

The last thing I noticed is that some of the names in the csv are using a delimiter:

https://github.com/collectionspace/cspace-converter/blob/master/data/core/authperson_core_all.csv#L2

@meganforbes if we want to capture multiple names we need to reserve termDisplayName as the primary name (it should always be single valued) and use another field (mvf) or fields for additional names. The converter wants to use the mvf delimiter (;) as a means to create multiple records, which is not the intention here. So we'll need to update the spreadsheet to at least convert mvf names to single for termDisplayName.

Paulmulonzia commented 4 years ago

Thanks @mark-cooper for the updates. I did pull the updated code and transfer of contact information fields now works.

meganforbes commented 4 years ago

@mark-cooper I need to update the sample data to remove the delimiters in the termDisplayName field. I remember our previous conversation that I should create another column that includes non-preferred terms, what's your preference for what to call that column? Is termDisplayNameNonPreferred ok, and then we can use that for all authorities?

mark-cooper commented 4 years ago

@meganforbes it's a bit of mouthful but if it works for you it's ok with me -- and yes, i agree we should use it across all authorities.

meganforbes commented 4 years ago

@Paulmulonzia @kspurgin Updated sample data for Person attached. Please see above couple of notes from Mark re: how to manage repeating values in termDisplayName field.

Person Authority.xlsx

Paulmulonzia commented 4 years ago

Hi @mark-cooper,

Thanks for the update in regards to the field termDisplayName

Kindly clarify on the following:

  1. Should we also have the rest of the fields in personTermGroup as single valued since if we only change termDisplayName to be single valued, transfer to cspace fails as there is a second personTermGroup created that does not have termDisplayName which is a required field.

  2. Where should we map the field termDisplayNameNonPreferred. From the person auth xml, we only have one field available for mapping (termFormattedDisplayName) which does not appear in cspace UI. XML: https://github.com/collectionspace/cspace-converter/blob/master/spec/fixtures/files/core_person.xml#L65 UI Screenshot:

image

mark-cooper commented 4 years ago

Hi @Paulmulonzia

I think the current position is to have termDisplayName and fields related to the termDisplayName be single valued.

Then termDisplayNameNonPreferred and fields related to termDisplayNameNonPreferred can be multi-valued.

The data for the fields related to the above name forms needs to be consolidated into a single structure to create a single personTermGroupList.

Because this pattern will be common to authority imports we may want to add a helper for it (I don't believe it exists currently). @kspurgin may want to look into that.

We discussed allowing termDisplayName and related fields to be multi-valued so there's no need for termDisplayNameNonPreferred, given that these are all the same thing. The problem with that is the converter expects any records identifier field to be unique and single valued and it does not work correctly if it's multi-valued as things stand. There is a good reason for this: altho' they can all look the same from the csv perspective there is actually a difference between the "primary" name and the other forms owing to the refname (shortid) which is used by the tool to lookup the authority in cspace. We could refactor authority imports to treat the first name as primary and used for the identifier but it would require a chunk of work because it breaks from the convention that all other imports use currently.

For the formatted display name because the field exists in the database I'd say just go ahead and map it as usual. @meganforbes want to add a JIRA to get it in the ui?

Paulmulonzia commented 4 years ago

Thanks @mark-cooper.

I have updated the sample data with fields related termDisplayName returned to single value.

I have also mapped termDisplayNameNonPreferred to termFormattedDisplayName.

Pull request - https://github.com/collectionspace/cspace-converter/pull/246

kspurgin commented 4 years ago

PR [#322] addresses the non-preferred name handling discussed here.

A value in termDisplayNameNonPreferred now populates the termDisplayName field in a second termGroupList/termGroup without causing an additional record to be generated.

All other termGroupList/termGroup fields can also be added to columns with NonPreferred appended to the header, and will be treated as expected in their respective field group.