IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
878 stars 489 forks source link

Batch Metadata Changes for Beta 4 #754

Closed posixeleni closed 10 years ago

posixeleni commented 10 years ago

Making several changes to metadata blocks based on user-feedback. Phil will need to update schema.xml once I have committed my changes to git.

Geospatial metadata block (New Metadata Block- updated datasetfields.sh for this!) https://github.com/IQSS/dataverse/issues/482

Updates to: Citation block

Changes to Social Science block

Minor Changes to Astrophyiscs Block

pdurbin commented 10 years ago

Phil will need to update schema.xml once I have committed my changes to git.

Yes. Please branch from the tip of master before you commit. I'll add a commit on top of that branch with a new schema.xml. Then I'll merge it into master so people will get both commits at once on master.

pdurbin commented 10 years ago

@posixeleni I don't see anything about "In the not-yet-released brand new Geospatial metadata block I have planned to allow for multiple entries for Geographic Coverage to prevent issues like you brought up" so this just a reminder about this. It's for Data Deposit API backwards compatibility. Here's what I had written originally:

geographicCoverage doesn't allowmultiples

Eleni, in DVN 3.6 we treated this as a multi-valued field:

   <dcterms:coverage>United States</dcterms:coverage>
   <dcterms:coverage>Canada</dcterms:coverage>

Should we change geographicCoverage to allowmultiples? Otherwise, I get this error: multiple values encountered for non multiValued field geographicCoverage: [United States, Canada]

My temporary work around is to comment out the line for Canada in Atom entry XML: https://github.com/IQSS/dataverse/blob/0b65611b1ed3b5143e3524b74a824acd01d21ff1/scripts/api/data-deposit/data/atom-entry-study.xml#L32

posixeleni commented 10 years ago

@pdurbin thanks for reminding me to include this for QA to test. I have in fact set it to "allow for multiple" as TRUE.

pdurbin commented 10 years ago

@posixeleni great. Yes, I see it reflected above now. Thanks.

Note to self to think about removing distributorContact from schema.xml per https://github.com/IQSS/dataverse/issues/759#issuecomment-49785898

pdurbin commented 10 years ago

The TSV's come from https://docs.google.com/spreadsheet/ccc?key=0AjeLxEN77UZodHFEWGpoa19ia3pldEFyVFR0aFVGa0E&usp=sharing

pdurbin commented 10 years ago

@posixeleni I just merged some bug fixes from master into https://github.com/IQSS/dataverse/tree/754-metadata so if you pull the latest from that branch, you should be able to run vagrant up and poke around. Once you're happy, please give this ticket back to me and I'll merge the branch into master and send out a note that a new Solr schema.xml is required. @scolapasta in that note I'm also going to recommend a database drop as well.

pdurbin commented 10 years ago

There's another change required that affects SWORD but I may need help from @landreev to figure it out. I thought making this change to scripts/database/reference_data.sql would be enough, but I still can't populate the dcterms:coverage field:

-INSERT INTO foreignmetadatafieldmapping (id, foreignfieldxpath, metadatablockname, datasetfieldname, isattribute, parentfieldmapping_id, foreignmetadataformatmapping_id) VALUES (12, ':coverage', 'socialscience', 'geographicCoverage', FALSE, NULL, 1 ); +INSERT INTO foreignmetadatafieldmapping (id, foreignfieldxpath, metadatablockname, datasetfieldname, isattribute, parentfieldmapping_id, foreignmetadataformatmapping_id) VALUES (12, ':coverage', 'geospatial', 'geographicCoverage', FALSE, NULL, 1 );

pdurbin commented 10 years ago

@landreev to make this more concrete, from the GUI, I can create both of the fields below ("authorName" from author and "country" from "geographicCoverage") but by using the importXML method I can only create the former. "country" has a typeClass of controlledVocabulary so maybe that's why? I did notice this comment that says "A controlled vocabulary entry... not supported yet; though I expect the commented-out code below to work" at https://github.com/IQSS/dataverse/blob/2ef3f55ecc9efe5a046118a8ea1d1405f8e1aa17/src/main/java/edu/harvard/iq/dataverse/metadataimport/ForeignMetadataImportServiceBean.java#L229 . This might be a red herring though because I can't seem to get that block to execute. It may be something else.

authorName from author

{
   "value" : [
      {
         "authorName" : {
            "value" : "Peets, John",
            "typeClass" : "primitive",
            "typeName" : "authorName",
            "multiple" : false
         }
      },
      {
         "authorName" : {
            "value" : "Stumptown, Jane",
            "typeClass" : "primitive",
            "typeName" : "authorName",
            "multiple" : false
         }
      }
   ],
   "typeClass" : "compound",
   "typeName" : "author",
   "multiple" : true
}

country from geographicCoverage

{
   "value" : [
      {
         "country" : {
            "value" : "United States",
            "typeClass" : "controlledVocabulary",
            "typeName" : "country",
            "multiple" : false
         }
      },
      {
         "country" : {
            "value" : "Canada",
            "typeClass" : "controlledVocabulary",
            "typeName" : "country",
            "multiple" : false
         }
      }
   ],
   "typeClass" : "compound",
   "typeName" : "geographicCoverage",
   "multiple" : true
}
pdurbin commented 10 years ago

Ah ha! If I use this SQL instead (putting the values in "otherGeographicCoverage")...

-INSERT INTO foreignmetadatafieldmapping (id, foreignfieldxpath, metadatablockname, datasetfieldname, isattribute, parentfieldmapping_id, foreignmetadataformatmapping_id) VALUES (12, ':coverage', 'socialscience', 'geographicCoverage', FALSE, NULL, 1 ); +INSERT INTO foreignmetadatafieldmapping (id, foreignfieldxpath, metadatablockname, datasetfieldname, isattribute, parentfieldmapping_id, foreignmetadataformatmapping_id) VALUES (12, ':coverage', 'geospatial', 'otherGeographicCoverage', FALSE, NULL, 1 );

... the importXML method is able to save the field:

{
   "value" : [
      {
         "otherGeographicCoverage" : {
            "value" : "United States",
            "typeClass" : "primitive",
            "typeName" : "otherGeographicCoverage",
            "multiple" : false
         }
      },
      {
         "otherGeographicCoverage" : {
            "value" : "Canada",
            "typeClass" : "primitive",
            "typeName" : "otherGeographicCoverage",
            "multiple" : false
         }
      }
   ],
   "typeClass" : "compound",
   "typeName" : "geographicCoverage",
   "multiple" : true
}

Here's how it looks in the GUI:

roasting_at_home_-_top_dataverse_of_pete_dataverse_-_2014-07-25_08 56 44

@posixeleni is this what you had in mind for dcterms:coverage? Put it under "Other"? Because how can we know if they mean country, state, city, etc.? I'm guessing dcterms:coverage (like the rest of dcterms) can mean various things.

posixeleni commented 10 years ago

@pdurbin Yes! We are using otherGeographicCoverage to be the geographic catch-all element for dcterms:coverage since it can mean various things (continent, nation, state, city, province, region, canton, etc....).

cc/ @landreev

pdurbin commented 10 years ago

Yes! We are using otherGeographicCoverage to be the geographic catch-all element for dcterms:coverage

Perfect. Thanks, @posixeleni! I just committed the fix to this branch.

Oh and http://www.diffkit.org is the tool I mentioned that might help people like me and @sekmiller reason about what changed in the TSV files (such as in 8c6df96). "DiffKit is an application, and a framework, for comparing two tables of data, field-by-field." I learned about it from @leeper at https://twitter.com/thosjleeper/status/445582381488287744

Now let's get you running with Vagrant so you can poke around in the live app.

posixeleni commented 10 years ago

Assigned to @scolapasta to help me review next steps to test and implement for next Beta release.

cc/ @pdurbin

pdurbin commented 10 years ago

@posixeleni I just spotted a typo: https://github.com/IQSS/dataverse/commit/8c6df969eb3e365e454966e543c9cfd74123835e#commitcomment-7269300

pdurbin commented 10 years ago

I ran git cherry-pick of TSV commits from @posixeleni and made a single commit comprising all the fixes in this issue.

I moved this ticket to QA. Here's what I plan to email around:

Due to metadata changes made in https://github.com/IQSS/dataverse/issues/754 everyone running Dataverse 4.0 code must drop their database, update to the latest Solr schema.xml, and clear their

After pulling the latest code, update your Solr schema.xml like this:

Clear out your Solr index like this:

Drop your database and get set up again per the dev guide:

https://github.com/IQSS/dataverse/blob/master/doc/Sphinx/source/Developers/dev-main.rst#rebuilding-your-dev-environment

kcondon commented 10 years ago

Tested prior to beta4 release, opened separate tickets for issues found.

Closing ticket