clarinsi / clarin-dspace

LINDAT/CLARIN digital repository based on DSpace
http://lindat.cz
BSD 3-Clause "New" or "Revised" License
0 stars 2 forks source link

euFunds not equal to size(dc.relation) #7

Open cyplas opened 7 years ago

cyplas commented 7 years ago

For some items, the openaire curation task yields "euFunds != size(dc.relation).":

2017-06-21 14:34:34,961 INFO  org.dspace.curate.Curator @ Curation task: openaire performed on: 11356/1024 workflowID=W124 with status: -1. Result: 'Object [11356/1125] metadat\
a are not synced with OpenAIRE requirements - euFunds != size(dc.relation).
cyplas commented 7 years ago

I've looked into this. The relevant class is OpenAIRE.java. If "euFunds != size(dc.relation)" is true, it means that the number of dc.relation metadata elements is not equal to the number of local.sponsor metadata elements which contain the string "euFunds". (The curation task also checks that the value of last part of every 5-part local.sponsor matches the value of one of the dc.relations, but this part never fails for us.)

@TomazErjavec If we want to get rid of this error, then in depends whether we consider these checks to be valid or not. Based on the answer to that, either you can fix the metadata accordingly (for the items mentioned in the healthchecks) or I can adapt OpenAIRE.java. The latter option isn't difficult if we don't complicate. For example, I could easily change the return code from "success" to "error" and/or change the output message.

TomazErjavec commented 7 years ago

If we want to get rid of this error, then in depends whether we consider these checks to be valid or not.

I'm sorry to say I have absolutely no idea, as I don't know if the OpenAIRE gets sent anywhere or not. Maybe a question to the LINDAT guys? Because, if it doesn't then it doesn't matter what the checks say. If it does, then it might.

cyplas commented 7 years ago

I'm sorry to say I have absolutely no idea, as I don't know if the OpenAIRE gets sent anywhere or not. Maybe a question to the LINDAT guys? Because, if it doesn't then it doesn't matter what the checks say. If it does, then it might.

See #13.

cyplas commented 7 years ago

Fixed this together with #6.

TomazErjavec commented 3 years ago

We, again, have two (recent) items that report this error:

Object [11356/1342] metadata are not synced with OpenAIRE requirements - euFunds != size(dc.relation).
Object [11356/1281] metadata are not synced with OpenAIRE requirements - euFunds != size(dc.relation).

If I just consider the first one, i.e. http://hdl.handle.net/11356/1342, it has:

dc. relation = info:eu-repo/grantAgreement/EC/H2020/825153
local. sponsor = European Union@@EC/H2020/825153@@EMBEDDIA - Cross-Lingual Embeddings for Less-Represented Languages in European News Media@@euFunds@@info:eu-repo/grantAgreement/EC/H2020/825153

and compare this with one that works, e.g. from http://hdl.handle.net/11356/1048:

dc.relation = info:eu-repo/grantAgreement/EC/H2020/640772
local. sponsor = EC@@640772@@DOLFINS@@euFunds@@info:eu-repo/grantAgreement/EC/H2020/640772

I can't see any difference in the number of fields between the two, so, how can the first one be wrong and the second one correct?

I've now, just in case this has any bearing, changed the wrong one so it is as similar to the correct one as possible:

local. sponsor = EC@@825153@@EMBEDDIA@@euFunds@@info:eu-repo/grantAgreement/EC/H2020/825153

but curation reports the same error as before, so I changed it back. I would appreciate some help with this, maybe it is something obvious that I am missing?

cyplas commented 3 years ago

I would appreciate some help with this, maybe it is something obvious that I am missing?

Just so you know, I took a look at this last week and didn't find anything obvious wrong. Hopefully a closer look will be more fruitful.

cyplas commented 3 years ago

Hmm, I took another look, and I don't think we were checking based on what I had determined earlier (https://github.com/clarinsi/clarin-dspace/issues/07#issuecomment-311060139):

If "euFunds != size(dc.relation)" is true, it means that the number of dc.relation metadata elements is not equal to the number of local.sponsor metadata elements which contain the string "euFunds".

If you look at the metadata, that explains why 1048 is ok but 1281 and 1342 are not:

Handle # of dc.relation elements # of local.sponsor elements with "euFunds"
11356/1048 3 3
11356/1281 0 1
11356/1342 1 2

By the way, I should note that what I had written in brackets does not hold:

(The curation task also checks that the value of last part of every 5-part local.sponsor matches the value of one of the dc.relations, but this part never fails for us.)

In fact, this also does fail for us in those two cases, as the excess sponsors obviously have no dc.relation match for their last component. Not sure what I was thinking there; perhaps in the previous cases from a couple of years ago we had more dc.relations than euFunds local.sponsorts, I don't know.