IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
878 stars 490 forks source link

Migrating Collections for PSI Dataverses #1454

Closed posixeleni closed 9 years ago

posixeleni commented 9 years ago

Related to #121

Since PSI is a power Dataverse user, @mcrosas and I discussed that their collections would best be applied as facets in 4.0. I will need to create a custom metadata block for these facets and will need assistance from @scolapasta to decide how we would map the pre-existing collections to these facets.

Working on this block here: https://docs.google.com/spreadsheet/ccc?key=0AjeLxEN77UZodHFEWGpoa19ia3pldEFyVFR0aFVGa0E#gid=11

posixeleni commented 9 years ago

@scolapasta @landreev @ekraffmiller @kcondon During our migration meeting yesterday we had discussed another option for migrating collections into 4.0 which would involve:

1) taking the top-level collections and transforming them to "Topic Classification" 2) all sub collections could be set to "Keyword" if not already.

To test this out I did an in-depth analysis of the PSI Dataverse Collections and found that out of the 95 dynamic collections they have 45 of them use multiple boolean terms to make up a dynamic collection. https://docs.google.com/spreadsheets/d/1-rXzWUIsyHdY9szNSuUZxVbrjkVCfVtX2Mk5bCyZ80g/edit?usp=sharing

For example under the Population collection, the "Men" subcollection has the following boolean terms: keywordValue:"Men" OR keywordValue:"MSM" OR keywordValue:"Military" OR keywordValue:"Truck drivers"

@scolapasta given the amount of collections that have multiple terms associated with them what do you propose we do for these collections when migrating them?

scolapasta commented 9 years ago

It almost seems like that should be a different topic classification - the issue still being that we don't have support for hierarchical dataset fields like this, and/or facets.

I wonder if there is any other dataset field that already would encapsulate the concept of population: Men? (but we'd have to look at each case, case by case.

posixeleni commented 9 years ago

@scolapasta what if we just list ALL the keywords as they have them (dont try to combine) but make sure they are associated with the topic classifications we generate? Then we can check with them to see if this would suit their purposes?

posixeleni commented 9 years ago

Will try making these a custom metadata block first to see what PSI thinks. This will require a new schema.xml and db drop.

posixeleni commented 9 years ago

Related to #1607 @pdurbin this is checked into a different branch (metadata-facets-beta14) and since these are new metadata blocks I am assuming it will need a schema.xml and db drop.

pdurbin commented 9 years ago

@posixeleni This is looking good. I merged it with master. Passing to QA.

kcondon commented 9 years ago

Tested basic block behavior, works, passing to Ellen to migrate data.

posixeleni commented 9 years ago

@scolapasta @kcondon is there any way I can use vm6 or vm5 to test out saved search for PSI Dataverse and send to PSI to review if they like this option or prefer the custom metadata block option?

cc/ @mcrosas @sbarbosadataverse

sbarbosadataverse commented 9 years ago

moved to dataverse-curation