DSpace / RestContract

REST Contract for DSpace 7-8
https://wiki.lyrasis.org/display/DSDOC8x/
37 stars 48 forks source link

Rest contracts for qaevents, qatopics and qasources endpoints #181

Closed LucaGiamminonni closed 8 months ago

LucaGiamminonni commented 2 years ago

The features included in this PR are the result of the OpenAIRE Call Innovation funded project "Enrich local data via the OpenAIRE Graph” awarded by 4Science (https://www.openaire.eu/open-call-winner-phase-1-4science). It provides a closer integration between DSpace and two OpenAIRE services, the Notification Broker and the OpenAIRE REST API. Detailed documentation about the aims of the project, the implementation and the configuration options is available at https://4science.github.io/oaire-eld/#/ This PR regarding only the data correction section. The publication claim will be migrated with other PR.

abollini commented 2 years ago

hi, as anticipated in the community call we agree that these endpoints are not self-explaining. Unfortunately, renaming them looks to be more expensive than what we hope but we are willing to do that if required and we can found a general agreement around better names. To reduce the effort and risk of miss the 7.3 we suggest to proceed in the following way:

I will try to explain a bit more the purpose of these endpoints and how they are not overlapping with existing concepts or other initiatives. These endpoints are completely unrelated with COAR Notify project. The COAR Notify project is about a paradigm to provide interoperability between open system in the scholarly communication space based on Linked Data Notification.

The endpoint was named in this way as they were initially designed around the concepts of the OpenAIRE Notification Broker service but later on, as part of the EOSC supported project ELD Advance they have been generalized so that other provider than OpenAIRE could be plugged in future.

They implements the Data Correction service of the ELD project https://4science.github.io/oaire-eld/#/data-correction

nbsources are the systems that will inform the repository somehow about improvements to existing data. Out-of-box an openAIRE provider is provided but in future you could have other such sources of suggestion as one base on UnPaywall / CORE that could suggest you to grab open access copies available elsewhere to complete your repository records

nbtopics are the "kind of correction" suggestions that you can get from a specific provider. OpenAIRE is able to suggest you a missing abstract, identifiers , untracked relations among publication and project and more. Other providers could have a completely different list of topics

nbevents are the exact correction suggestions that you have got from the provider. They were named events because this is how they are called in the OpenAIRE Notfication Broker (that is based on a queue/publish/subscribe architecture)

tdonohue commented 2 years ago

@abollini : Thanks for the additional context. However, that brings up a few more questions in my mind:

My main concern here is around duplication of code / features / use cases. If these are all unique, then we need to better document / understand how they are different, and what use cases are met by each.

SIDENOTE: I noticed in your description you often use the term "correction". It makes me wonder if these are better considered "correctionSources" (nbsources), "correctionTopics" (nbtopics) and "correctionEvents" (nbevents). Not sure that's the proper terminology for each, but it's an early idea.

abollini commented 2 years ago

Thanks for your feedback @tdonohue and to encourage a broader discussion. External sources could be related with some correction sources as in the case of openaire Graph that is used by their notification broker service to suggest correction to repositories about the records that they have contributed to the graph or can be queried via API as we actually do in the second service provided by the ELD project (the publication claim, a dedicated PR will come). It is not hard to think about some scripts that would compare the data in an external source with the data in dspace and would report the differences in form of correction to apply to the dspace items. We actually have plan for that in future and we expect to be able to reuse some of the external source code (i.e. the external source classes will be probably collaborators of the OtherCorrectionSource). Nevertheless there are also lot of potential correction sources that are not related to external sources such as the previous mentioned example from unpaywall or a ML algorithm that would process the items suggesting to add I keywords or relations using entity extraction or topic classification. Last but not least these endpoints are intended to deal with archived items, external sources are instead involved in the records creation.

paulo-graca commented 2 years ago

@tdonohue in today's meeting also agreed with this change.

  • naming, I would prefer to have a qa- instead of a qa prefix, having:

    • /api/integration/qa-events
    • /api/integration/qa-sources
    • /api/integration/qa-topics
LucaGiamminonni commented 2 years ago

Hi @paulo-graca, according to endpoint naming conventions, the - character should not be used https://github.com/DSpace/RestContract#on-the-naming-of-endpoints Infact for example we have bitstreamformats, externalsources, itemrequests endpoints and not bistream-formats, external-sources and item-requests. So I think that the correct formats are qaevents, qasources and qatopics.

tdonohue commented 2 years ago

@LucaGiamminonni : Thanks for pointing that out... I had completely forgotten that policy we set a long time ago. In that case, I'm ok with keeping these as qaevents, qatopics, etc.

paulo-graca commented 2 years ago

Hi @paulo-graca, according to endpoint naming conventions, the - character should not be used https://github.com/DSpace/RestContract#on-the-naming-of-endpoints

Thank you @LucaGiamminonni for pointing it out. I don't understand (even how it's written) it as a strict rule, but, as a good practice. There are currently a bunch of existing endpoints that doesn't follow that "rule", like: claimedtasks-search, eperson-registration, resourcepolicies-search, vocabularyEntryDetails-search, submissioncclicenseUrls, submissioncclicenseUrls-search and the reason might be because they intend to be more descriptive or specify an action separately (like search). I don't recall any endpoint, beside the ones that refers "url" that also uses initials. The rest contract also points out that names must be descriptive. It's clear that, this case is an exception to the "rule". If we strictly followed the rule for names convention, they should be: qualityassuranceevents, qualityassurancesources, qualityassurancetopics. For the good of simplicity and because we are using the "quality assurance" as an umbrella, I think we should have any kind of separation for "quality assurance" from the rest.

so I'm proposing naming them to:

dropping the proposed "-" that is currently used to separate actions and the use of All Caps for the "quality assurance" initials. I think they should be adopted because they are descriptive but reasonably short.

LucaGiamminonni commented 2 years ago

Hi @paulo-graca, according to endpoint naming conventions, the name of the endpoints must be all lower case, so it's best to leave them as I specified. The values claimedtasks-search, eperson-registration, resourcepolicies-search, vocabularyEntryDetails-search, submissioncclicenseUrls, submissioncclicenseUrls-search are just the name of the endpoints which, however, do not have the - character in the path (such as /server/api/workflow/pooltasks/search), so these endpoints also respect the convention.

paulo-graca commented 1 year ago

Thank you @LucaGiamminonni for your response. Sorry for being so picky about this. My last suggestion, justified by the reasons I've explained was to use all caps for the initials QA. Not to use the dash (-). As referred I'm sure we have a special case, just like it occurs with: vocabularyEntryDetails that justifies this approach. Otherwise, to be strict with what is written and to be descriptive, naming should be more verbose and should be: qualityassuranceevents, qualityassurancesources, qualityassurancetopics I'm also ok with this. We already have others also verbose namings like: submissionaccessoptions

tdonohue commented 1 year ago

Just to add in my feedback on @paulo-graca 's proposal...

I see @paulo-graca 's point that @abollini already previously created an endpoint with a camel-cased name (vocabularyEntryDetails in #128). So, we could then potentially update these to be qaEvents, qaTopics and qaSources. However, I'm not a fan of starting an endpoint with capitalization, so I'd rather we not use QAevents, etc.

Honestly, overall, I'm not picky about the way these endpoints are named, but it does sound like this needs a discussion. It doesn't make sense for use to have a few camel-cased endpoints (vocabularyEntryDetails and submissioncclicenseUrls are two I see) while banning others. We should either consider fixing those existing camel-cased endpoints, or change the rules to say that "camel-case is OK but the first letter should be lowercase".

Overall, I think regardless of what we name these endpoints, they need much more detailed descriptions in this REST Contract PR. I agree that the names qaevents, qatopics and qasources look a bit confusing & are difficult to read. However, I'm even OK with keeping those names as long as we provide a better summary of each endpoint in this REST Contract.

For instance, if we kept the "qaevents" name, we should clarify it's meaning at the top of qaevents.md like this:

# Quality Assurance Events (qaevents)

These endpoints provide access to Quality Assurance (QA) Events. 
A QA Event is a recommended modification to an existing Item in DSpace, which is sent to DSpace by a Quality Assurance Source (link to qasources.md)"

Overall, I do agree that the REST Contract is confusing as-is. But, I don't think that necessarily requires the endpoints to be renamed. I had said something similar in my feedback from Aug 2 (which hasn't been addressed yet)...as I noted that endpoints like "qaevents" lacked any descriptions about what a "QA Event" even is or where it comes from.

I think if we added better descriptions to these endpoints, then the names may not matter quite so much. But, currently, the names are very confusing as there is no details given as to what "QA" means, and why each of these endpoints starts with "qa".

paulo-graca commented 1 year ago

Sorry all for being so picky with this. The reason It's because we are planning to extend this feature, for our service, to use a different source. Naming will be crucial for better understanding. I don't want us all to subvert naming conventions, but I think that we all agree that we have here something different. We have an hat for quality assurance and three different concepts for it. So, to be strict with the convention I think we should adopt: qualityassuranceevents, qualityassurancesources, qualityassurancetopics isn't my favorite approach but they are descriptive and don't use the camels, uppercases or dashes. Naming aspects was an issue since the start for this contribution, if we still plan to use QA or qa initials I think we should have a clear differentiation what is Quality Assurance from the rest.

paulo-graca commented 1 year ago

@tdonohue I would also to point out another aspect in this PR that needs discussion: the way we deal with different sources in DSpace. This PR presents us an approach to deal with sources that differs from what we are used to with external sources or authority.

Sincerely, I prefer this option (or approach) presented by @LucaGiamminonni /4Science. But I understand that if accept it, that will lead us to have more redundant ways to do kind of the similar things, and I think we should avoid redundancy in this case. I think we should have only one way to deal with sources and be stick with it, but I understand that might be an imposition. I've also pointed out the same with suggestions PR #192 and, to me, that will lead us to have 3 possible approaches:

  1. accept the redundancy of ways to deal with sources
  2. don't accept what is proposed
  3. change what we already have to be aligned with this approach
tdonohue commented 1 year ago

Per today's meeting, we decided that:

  1. The endpoints should be renamed qualityassuranceevents, qualityassurancesources, qualityassurancetopics
  2. I will create a PR to better document our official naming conventions, so that we can avoid having the same naming discussion in the future. This PR will enhance the current naming conventions here (but will not drastically change them) https://github.com/DSpace/RestContract/blob/main/README.md#on-the-naming-of-endpoints DONE, see #204
LucaGiamminonni commented 1 year ago

@tdonohue @paulo-graca the PR is ready to be review. thanks