Closed jggautier closed 5 years ago
We have use some ugly tricks to have the OpenAIRE compatibility because Dataverse has not all metadata that need OpenAIRE. You can see them in the file https://github.com/Consorcio-Madrono/dataverse/blob/v4.6WithOpenAIRE/src/main/resources/templates/datacite_40.ftl .
datacite_40.ftl
This .ftl
file must be an FreeMarker file. I see the dependency has been added to the pom.xml
at https://github.com/Consorcio-Madrono/dataverse/blob/025df77e0a25a8ad9221fec61925af88ed09053a/pom.xml#L57 . Perhaps this would be better discussed at https://groups.google.com/forum/#!forum/dataverse-dev (please feel free to start a thread there if you like, @juancorr ) but I'm curious about why you've introduced FreeMarker into your branch and if there is any alternative that's already part of the Java EE standard. I'm not trying to criticize. I'm just curious. I've never used FreeMarker.
We have used the sbgrid code as base (https://github.com/sbgrid/sbgrid-dataverse/tree/feature/datacite-xml). We only have patched these code, the Dataverse code and adapted the inital sbgrid FreeMarker file to have a valid DataCite XML code and accomplish OpenAIRE guidelines. It is the first time that I use a FreeMarker file too, but it is easily adaptable to accomplish other institutions requirements and to have special cases out of the java code. This works very well with e-cienciaDatos, but we have 12 datasets. We have not tested it in a large Dataverse installation. Sorry, I have not enough experience with this files to discuss about it.
@juancorr oh! So you weren't the one to add the FreeMarker dependency. It's from the SBGrid branch. Thanks. I understand now.
Yes, I had said it in my first comment in https://github.com/IQSS/dataverse/issues/3697 , but I should have emphasized it.
Dear all, I’m glad to announce that our proposal to enhance the interoperability of several open source platforms has been awarded by OpenAIRE, see https://www.4science.it/en/2018/02/23/4science-awarded-by-openaire/ In our proposal, we have included the implementation of the Data Repository Guidelines in Dataverse, more specifically the support for the datacite schema 4.1, to be ready for the new version of the guidelines that are expected soon. We have just found this thread, I’m really happy to see our assumptions about the benefit of this development confirmed by the community and I will be happy to contribute to develop a general solution that works for all and hopefully can be included by default in a next Dataverse version
@abollini that's great news! Can you please also start a new thread about this at https://groups.google.com/forum/#!forum/dataverse-community to spread the word? Thanks!
@abollini thanks for posting https://groups.google.com/d/msg/dataverse-community/OALTzINxkX0/v_WwJ4cvAwAJ ! Also, I mentioned your proposal in the Dataverse Community News yesterday: https://groups.google.com/d/msg/dataverse-community/AlZHT6tQM3U/0RrMUOv1AgAJ
Next it would be great to get a shared understanding of what you think the pull request will look like, what the scope of change will be. To get on the same page literally, it would be nice to have a Google doc or similar for what you have in mind. For now I'm linking to this issued in the "Dev Efforts by the Dataverse Community" spreadsheet at https://docs.google.com/spreadsheets/d/1pl9U0_CtWQ3oz6ZllvSHeyB0EG1M_vZEC_aZ7hREnhE/edit?usp=sharing but please feel free to create new issues as needed if you want to divide the work into smaller chunks. In our experience, smaller chunks move more easily across our kanban board at https://waffle.io/IQSS/dataverse
In short, please let us know if there is anything you need!
We have created a PR with the result of our development: https://github.com/IQSS/dataverse/pull/4664/ we will be happy to receive feedback and improve it as needed
@abollini hi! Thanks for the pull request! I just advanced it to Code Review at https://waffle.io/IQSS/dataverse and left you a review.
@juancorr are you interested in giving a review as well?
Thanks Philip, yes I am very interested. I will review it.
Juan Corrales
2018-05-14 2:43 GMT+02:00 Philip Durbin notifications@github.com:
@abollini https://github.com/abollini hi! Thanks for the pull request! I just advanced it to Code Review at https://waffle.io/IQSS/dataverse and left you a review.
@juancorr https://github.com/juancorr are you interested in giving a review as well?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/IQSS/dataverse/issues/4257#issuecomment-388668677, or mute the thread https://github.com/notifications/unsubscribe-auth/AAT5CD1fkBu1ZqMjf69lOt1NmPOkEtYvks5tyNM4gaJpZM4QQ_o5 .
Great! Thanks @abollini and team for the PR, @pdurbin for the feedback, and @juancorr for taking a look!! :) I'll move this to Inbox column on our Waffle board for now, as it's a large PR there's already some feedback and community review offers.
@abollini any news? Are you blocked? Do you need anything? @juancorr and I have been chatting a bit in IRC if you'd like to join us some day. 😄
In 4b28306 I added "DataCite OpenAIRE" to the list of export formats. @djbrooke and I just spoke about how tests would be nice but they're tricky for external developers to write so I went ahead and moved this issue (and #3697) to QA.
I haven't begun testing yet but during a test deployment, found that OpenAire was not appearing in export list and this error is in server log: Could not find key "dataset.exportBtn.itemLabel.dataciteOpenAIRE" in bundle file.
@kcondon good catch. Fixed in 7c11bc0. Here's how it looks:
@abollini @lap82 @francescopioscognamiglio @juancorr please take a look at the tests I added as of 5336e67. As of this writing OpenAireExportUtil.java
, for example, has 51.79% code coverage, up from 0%. 😄 Here's how it looks in Netbeans:
Thanks @pdurbin , I have just starting my war with code coverage tools (Ok, NetBeans is a good ally), I did not know it. I will see the tests. What is the right method to suggest more tests?.
@pdurbin @abollini @lap82 @francescopioscognamiglio I have added some new tests and have found two little bugs in openAIRE code related to geolocalization and the alternative title. Should I open a pull request to @abollini code for bugs and another pull request to main develop Dataverse branch for tests?.
@juancorr we try to work in small chunks so multiple pull requests sounds better. Thanks!
Hi everyone,
Is there a crosswalk or any documentation I could peak at for this PR? It's really cool being able to poke at this work, but might be helpful if there's a crosswalk or something explaining how fields are being mapped.
For now, here are other potential problems I've seen with the OpenAIRE metadata in the PR as of last week. I'm not sure how important it is to fix many of these problems for this github issue, but I would argue that at least the first is considered and fixed:
If any file is restricted, the rights property is "closedAccess." The definition of closedAccess, https://wiki.surfnet.nl/display/standards/info-eu-repo#info-eu-repo-AccessRights, is access "by financial means," or toll gated. restrictedAccess seems more appropriate.
But maybe it's best to try following how closedAccess and restrictedAccess are used by Zenodo (which doesn't use closedAccess as "toll gated"): -- If any of the files in the dataset are restricted and the option to request access is enabled (people are allowed to request access), the dataset is restrictedAccess -- If any of the files in the dataset are restricted and the option to request access is disabled, the dataset is closedAccess
Dataverse's "language" field and OpenAIRE's "language" property should define the language of the resource (i.e. dataset). But here the Dataverse language field is being used to populate the xml language attributes of certain OpenAIRE fields:
<title xml:lang="English">Historical Climate Model Output Of Echam5-Wiso From 1871-2011 At T106 Resolution</title>
When a Dataverse depositor chooses English, she's saying that the dataset is in English. But this PR uses English to describe the language of the metadata as well, which isn't always true.
Could the xml attributes not be used for now?
It looks like the NameType attribute is always set to "personal" (as opposed to Organization) if there's a comma in the entry. But there being a comma doesn't guarantee that the entry is a personal name. nameType isn't mandatory. Can it be removed?
There might be an issue with the funder property and Dataverse's Grant Information field. I haven't had a chance to explore how the funder information in Dataverse's Grant Information fields are mapped to OpenAIRE.
Lastly, when OpenAIRE harvests the openaire set, are they getting the xml document that's in the metadata download pulldown or another xml document? I ask because the document in the pulldown points to DataCite's xml schema, but I would think that OpenAIRE has its own xml schema since it's doing things that DataCite's 4.0 schema would find invalid.
Update: Just noticed that OpenAIRE does say that it expects metadata encoded "in the DataCite format (prefix oai_datacite)" (https://guidelines.openaire.eu/en/latest/data/use_of_oai_pmh.html). So I guess that means it's okay to point to DataCite's xml scheme. What's confusing me now is if it's okay to point to DataCite's 4.0 schema, when OpenAire is based on DataCite 3.1. For example, OpenAire uses DataCite's "funder" contributor role, which was deprecated by DataCite 4.0. Not sure if this will cause problems.
I hope this helps!
Since the purpose of this pull request is mainly to get Dataverse to export OpenAIRE complaint metadata so that OpenAIRE can harvest it, I'm adding OpenAIRE's validator page, https://www.openaire.eu/validator/welcome, which also includes a link to register your repository.
@jggautier I thought #4318 was about harvesting. This issue is about export.
@jggautier @kcondon and I just talked this out. @jggautier is going to work on figuring out what work remains before this issue about OpenAIRE goes to QA.
Thanks @jggautier,
I will try answer some points related to OpenAIRE compatibility. I hope can explain it in English.
Development done with #4318 allow Dataverse be compatible with OpenAIRE 4.0 guidelines which are in DRAFT version yet, but compatible dataverses or Dataverse installations should fill all required OpenAIRE metadata. I think that this development is compatible with current guidelines, but I have not checked it yet.
I think as @jggautier about nameType. It is not shown in guidelines http://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/latest/field_creator.html#dci-creator and is optional in xsd file: http://schema.datacite.org/meta/kernel-4.1/metadata.xsd
Funder information section have not a complete definition in OpenAIRE guidelines (which sections are mandatory): "Grant Agency" is mapped to "funderName" and "Grant Number" to "awardNumber" and xsd is validated.
A repository need be harvested to be OpenAIRE compliant.
@abollini @lap82 @francescopioscognamiglio please note that @juancorr has made a pull request against your pull request at https://github.com/4Science/dataverse/pull/4 to add some more tests.
Thanks @juancorr. I'm hoping we can use @abollini's Google Groups thread to get a shared understanding of the scope of this issue, which will be helpful when it comes time to test this PR. Everyone who's interested, please feel free to add your thoughts. Thanks!
I just read through post by @jggautier above and it's a great summary of the conversation he, @kcondon and I had yesterday. @abollini @lap82 @francescopioscognamiglio @juancorr please take a look and let's talk about the scope of the pull request and how much more development needs be done before we advance it from code review to QA. Thanks! Others are welcome to comment as well, of course!
@abollini what do you think?
I need to get up to speed on this with @jggautier early this week, post Community Meeting. :)
@abollini @lap82 @francescopioscognamiglio @juancorr we should have some feedback soon.
Thanks, @djbrooke, for discussing with me. We're looking forward to getting @abollini's input on:
Thanks @jggautier, moving back to Development until this feedback is implemented or responded to.
Hey @abollini - any news? Let us know if there's anything we can do. Thanks!
hi all, sorry for the delay. We will try to reply to your comments by the end of next week at latest
@abollini hi! Any news?
Last week @jggautier indicated he's interested in trying something on a running server with the openaire branch on it. This morning I pinged @juancorr at http://irclog.iq.harvard.edu/dataverse/2018-07-16#i_70126 and he's going to set up a server for testing soon. Thanks!
While I'm writing, any news, @abollini ?
I've been out for a week. Any news on this issue? I see @jggautier left a longish comment at https://github.com/IQSS/dataverse/pull/4664#issuecomment-405722192 but that was three weeks ago.
@jggautier when you get a chance can you please summarize the status of this issue?
Moving to the inbox until there's additional work on this.
I just noticed that @fcadili resolved the merge conflicts in pull request #4664. Thanks!
Does that mean you are ready for code review? Please let us know how we can help. 😄
@jggautier I spun up the branch (openaire-103925a) at http://ec2-100-27-31-230.compute-1.amazonaws.com:8080 if you'd like to poke around. The password is "admin1".
Thanks @pdurbin! I'm trying to see the exported OpenAIRE metadata for a dataset, but when I try to export it, or export any metadata really, I get a "This site can’t be reached" page. Is it possible to export the OpenAIRE metadata?
@jggautier whoops! My fault! I hadn't configured dataverse.siteUrl
. http://ec2-100-27-31-230.compute-1.amazonaws.com:8080/api/datasets/export?exporter=oai_datacite&persistentId=doi%3A10.5072/FK2/G251YB should now work, which is a link from Export Metadata at http://ec2-100-27-31-230.compute-1.amazonaws.com:8080/dataset.xhtml?persistentId=doi:10.5072/FK2/G251YB
Yes, I'm working on it. I'm double checking to have applied the received feedback and I will comment on the PR about it that soon. Thanks for reviewing it.
@fcadili great! I just invited you to join https://github.com/orgs/IQSS/teams/dataverse-readonly/members . If you're ok being assigned to this issue, I'll move it from "Inbox" to "Community Dev" at https://waffle.io/IQSS/dataverse
Thanks @fcadili for the updated PR. I'm assigning myself and @jggautier so that we can check out what was implemented from a metadata perspective. We may have some questions, but after that we'll move it along so a developer can review it. Thanks again!
Thanks @fcadili! The concerns I had about funder, language and rightsList metadata seem resolved. Looks great!
The rules being used for figuring out the creator "nametype" seem to have changed. They seem to be:
The first rule is great I think, since it seems that ORCID is intended for only researchers. But I think the second rule will result in a lot of creators being tagged as "personal" when they're not. I see a lot of datasets in Dataverse repositories (and in non-Dataverse repositories harvested by Harvard Dataverse, like ICPSR and ODESI) where the author is an organization, and the affiliation field contains another organization, like the organization's host institution.
Sending metadata that indicates that an author is a person or an organization seems to be important (e.g. https://github.com/IQSS/dataverse/issues/5029, studies being done into authorship decisions, generating citations in different styles). I just don't know how tolerant of miscategorized creators we should be. DataCite uses an algorithm that we're told is right about 90% of the time.
Moving back to Community Dev for now. @fcadili let us know your thoughts on the above!
I'm working on creator nametype in order to apply DataCite algorithm described in https://github.com/IQSS/dataverse/issues/2243#issuecomment-358615313. When done I will comment on the PR about it. Thanks for reviewing it.
Thanks @fcadili. I saw the latest comment in your PR (https://github.com/IQSS/dataverse/pull/4664#issuecomment-484387154) about using that algorithm. Moving this to code review.
@philippconzett (Dataverse Network Norway) wrote in https://groups.google.com/forum/#!msg/dataverse-community/lgSTeI-0zkQ/R7W8CfzvAAAJ:
@juancorr shared in another issue about adding DataCite metadata to the Export Metadata pulldown that Dataverse e-cienciaDatos
The definition of done for this issue will be a Dataverse admin being able to have OpenAIRE harvest OpenAIRE-compliant metadata from her installation.