collectionspace / cspace-converter

Migrate data to CollectionSpace.
0 stars 7 forks source link

Converter profile for Cataloging_Materials_All #113

Open mark-cooper opened 4 years ago

mark-cooper commented 4 years ago

Mapping

https://github.com/collectionspace/cspace-converter/blob/master/lib/collectionspace/converter/materials/collectionobject.rb

Data

https://github.com/collectionspace/cspace-converter/blob/master/data/materials/cataloging_materials_all.csv

Config: use "materials"

https://github.com/lyrasis/cspace-converter/blob/master/DEV.md#materials https://materials.dev.collectionspace.org

Start by manually creating a record in cspace using first row of sample data, using only the identifier and any new / updated fields. Export the XML then delete the record from cspace. Upload copy of XML here. From the converter you can use: bundle exec rake remote:get[$type/$csid] to get the XML.

Implement fields not covered in mapping currently. Fields already mapped in Core do not need to be mapped again. If a field was not mapped in core but belongs in core it should be added there (i.e. number value / type). If a field mapping needs to be changed add the field name to the redefined_fields list then map as normal.

mark-cooper commented 4 years ago

@Paulmulonzia I changed the description a bit. The reference XML can be much simpler, we don't need to retest fields we already have covered, so it only needs the identifier and the new / updated fields. Also, there may be fields not already mapped in core (like number value / type) but are core fields so core needs to be updated in that case.

Paulmulonzia commented 4 years ago

Well understood @mark-cooper.

Am facing some issues trying to import the sample data:

  1. There are 3 unnamed dimension field columns that lead to a "duplicate headers" error due to the headers being null (the error is not encountered when I pass in values to be used as headers for the said columns). Please let me know the header names to be used.

image

  1. After temporarily fixing the above issue, I am getting the following error on import jobs:

"import_message": "uninitialized constant CollectionSpace::Converter::Materials::MaterialsCollectionObject::CoreCollectionsObject"

I did check and there is no such constant in the cataloging converter profile. We only have CollectionSpace::Converter::Materials::MaterialsCollectionObject.

Thanks.

mark-cooper commented 4 years ago

@meganforbes what should the missing headers be called? @Paulmulonzia for now using temp / placeholder names is fine, we can correct it later.

I just pushed a fix in master for the constant error.

Paulmulonzia commented 4 years ago

Thank you @mark-cooper, the issue got resolved.

I have mapped a couple of additional fields for the materials converter as well as for the core and I am currently stuck at splitting values for the handling and handlingNote fields

https://github.com/collectionspace/cspace-converter/blob/DDD-cataloging-Materials/lib/collectionspace/converter/materials/collectionobject.rb#L31

I am getting the following error during import:

"import_message": "undefined method `to_sym' for {\"handling\"=>\"heavy\", \"handlingNote\"=>\"very heavy\"}:Hash",

The following fields are also not available in cspace when manually creating the cataloging record:

briefDescription and objectStatus

mark-cooper commented 4 years ago

@Paulmulonzia this an eyeball guess but maybe because overall is an array it shouldn't be wrapped in an array here:

https://github.com/collectionspace/cspace-converter/blob/DDD-cataloging-Materials/lib/collectionspace/converter/materials/collectionobject.rb#L36

So just overall instead of [overall]?

Paulmulonzia commented 4 years ago

That fixed it, thank you.

mark-cooper commented 4 years ago

@Paulmulonzia I updated and merged your branch, so you'll need to start from a new branch off master for ongoing work.

Looking at the reference XML the structure for materials cataloging has two record types:

https://github.com/collectionspace/cspace-converter/blob/master/spec/fixtures/files/materials_collectionobject.xml#L152-L153

That requires an approach similar to the one used for core person:

https://github.com/collectionspace/cspace-converter/blob/master/lib/collectionspace/converter/core/person.rb#L7-L24

For materials the MaterialsCollectionObject.map would go into the collectionobjects_materials section.

We need the spec to check the data is going to the right place. I started that here:

https://github.com/collectionspace/cspace-converter/blob/master/spec/collectionspace/converter/materials/collectionobject_spec.rb#L14

And I updated person to follow the same convention (making sure contact is going to the right place):

https://github.com/collectionspace/cspace-converter/blob/master/spec/collectionspace/converter/core/person_spec.rb

Paulmulonzia commented 4 years ago

Hi @mark-cooper. Thank you for the update, I did modify the structure to accommodate both record types.

I have mapped all fields and updated the spec file.

I am however facing an issue running transfers. I am getting the error "Transfer requires no pre-existing csid and uri." despite the object not being present in cspace.

image

mark-cooper commented 4 years ago

Hi @Paulmulonzia

Isn't it this one? https://core.dev.collectionspace.org/cspace/core/record/collectionobject/64182954-9d00-40dd-a0b8

Paulmulonzia commented 4 years ago

I just cleared it from cspace and tried again and got the error.

I did try the last object DM2004.001.0020 and it got transferred. However its going to core.dev instead of materials.dev which is the config that I am currently using.

mark-cooper commented 4 years ago

Have you checked you are definitely using the right config?

https://github.com/collectionspace/cspace-converter/blob/master/docs/test/MATERIALS.md

I just tried that config with the materials cataloging file and it appears to be working fine. I transferred and deleted all the records in materials.dev successfully.

Paulmulonzia commented 4 years ago

Its strange when I run the import and transfer via CLI, all works fine all objects are imported and transferred to materials.dev, but when using the UI, the objects all get trasnferred to core.dev.

I will check on it, but at the moment CLI works fine.

mark-cooper commented 4 years ago

Weird, that's very mysterious. I just tried using the UI because I couldn't recall how I tested it earlier and via the UI seems to be fine. They should of course be exactly the same thing.

So long as the remote is set correctly there shouldn't be any possibility of aiming at the wrong site, which can be checked at /connection.

Paulmulonzia commented 4 years ago

Hi @mark-cooper, here's my connection page:

image

mark-cooper commented 4 years ago

Hi @Paulmulonzia thx. I do see the problem now. The dev server is actually a multi-tenant system therefore any domain that can reach cspace-services is valid and the tenant you will use is actually determined by your login user:

https://github.com/collectionspace/cspace-converter/blob/master/docs/test/MATERIALS.md

The user should be: admin@materials.collectionspace.org

Paulmulonzia commented 4 years ago

Hello @mark-cooper thank you for pointing out the issue, all objects are now being correctly transferred.

There were some few issues I observed:

inventoryStatus, publishTo field values not being displayed in cspace but when fetching the xml, values are present.

The field collection - objects with the value "non-circulating" not showing up on cspace but they are present on the xml.

dimension had to be modified to use valid values available in the list in cspace. Most objects were using the value "dimension" which is not available in the list items in cspace.

mark-cooper commented 4 years ago

@Paulmulonzia great, glad that worked out, and that sounds good to me for dimension.

For inventoryStatus and publishTo I see that the reference XML doesn't contain values from cspace, which means we currently can't test them in the spec, and we don't have a known good value to compare to. So the ref XML needs to be updated. Beyond that I think the issue is probably the terms need to be updated in the csv like you did for dimension. For example, I'm guessing that "Omeka" probably needs to be "omeka".

Paulmulonzia commented 4 years ago

Hi @mark-cooper,

I found out the issue was inventoryStatus and publishTo are supposed to use vocab.

collection is supposed to use _ rather than - to separate the values, so I updated the sample data.

I have pushed these updates to github.

Thanks.

mark-cooper commented 4 years ago

@Paulmulonzia I merged your branch. Be sure to update your copy master for future work.

It looks like there's still a bit of cleanup needed for the tests:

https://github.com/collectionspace/cspace-converter/blob/master/spec/collectionspace/converter/materials/collectionobject_spec.rb#L25-L35

If the test can be covered by core then it doesn't need to be repeated for materials.

meganforbes commented 4 years ago

@Paulmulonzia we have a few new fields to add to the converter profile for Cataloging to support a data migration. I have attached the XLS with updated sample data here (GitHub won't let me attach CSV). The fields are: otherNumber, otherNumberType, publishTo, inventoryStatus, fieldCollectionNote, fieldCollectionFeature, productionPeopleRole, and prouductionNote.

sample_data_cataloging_core_excerpt_V2.xlsx

kspurgin commented 4 years ago

Megan's comment above is about core, not materials, so I've updated [#98] with fuller/specific info about the changes needed.

Paulmulonzia commented 4 years ago

HI @kspurgin, currently dimensions helper script not splitting values for the field dimensionSummary which is multi-valued in cspace.

kspurgin commented 4 years ago

@Paulmulonzia Will you create an issue for the dimensions issue and assign it to me? Please link it to this issue by mentioning it "blocks #113"

Not sure when I will I will be able to look at it. This week is packed with stuff on deadlines.

Paulmulonzia commented 4 years ago

No worries, let me create it.

Paulmulonzia commented 4 years ago

HI @kspurgin, currently dimensions helper script not splitting values for the field dimensionSummary which is multi-valued in cspace.

Fixed in https://github.com/collectionspace/cspace-converter/pull/229