cessda / cessda.cdc.versions

Issue track and wiki for the CESSDA Data Catalogue
https://datacatalogue.cessda.eu/
Apache License 2.0
0 stars 0 forks source link

Extract Study Url from NESSTAR records #94

Closed cessda-bitbucket-importer closed 5 years ago

cessda-bitbucket-importer commented 5 years ago

Original report on BitBucket by John Shepherdson (GitHub: john-shepherdson).


Not in expected field. See email correspondence between Ornulf Risnes and John Shepherdson.

cessda-bitbucket-importer commented 5 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


Hi John,

I believe I have been able to confuse us both in several different ways during this discussion about constructing a synthetic study URL to Nesstar held studies.

The main cause of confusion can probably be clarified by the following statement:

If landing page URLs (e.g. synthetic Nesstar WebView-urls) are part of the DDI payload already, no modifications to the OAI-PMH-Nesstar front are required.

The payload is already there.

We have now set up a preliminary OAI-PMH-front for NSD's main survey repository (http://nsddata.nsd.uib.no).

The OAI-PMH-front-service will get a domain name (likely oai-pmh.nsd.no), but for now it has an IP-address only.

http://129.177.90.182/oai-pmh/?verb=GetRecord&metadataPrefix=oai_ddi&identifier=http://nsddata.nsd.uib.no:80/obj/fStudy/NSD2200-3

NB! Right now you won't be able to access this URL due to University firewalls. I will address this.

But attached you will find the OAI-PMH-xml response payload for that particular OAI-PMH-request. I also attach a screenshot from my browser as evidence that I'm not making this up.

Now - in the attached xml response, you will find the URI to the landing page for this particular study under:

stdyDscr/method/dataAccs/setAvail/accsPlac

...approximately on line 223 in the xml-document when viewed in a text editor:

doi:10.18712/NSD-NSD2200-3-v1

To sum up: The good:

The bad:

This means that CESSDA archives with Nesstar Servers, and that have been following our practice by using the -element for landing page info can use an unmodified version of the Nesstar OAI-PMH-server to expose this info.

It is however unclear how many archives this "convenience" will benefit.

http://nsddata.nsd.uib.no:80/obj/fStudy/NSD2200-3 This means that we are unable to present a unified view of the identifier (or what we mean by identifier) through both DDI and DC at a time. Note that this exact same problem will also occur if we use the CMM-suggestion: instead of the CESSDA Template element that we use: Perhaps not a big problem if the CESSDA-catalogue-harvesters rely on DDI (and not DC) in their harvesting. To sum up - if we ignore the DC-problem (and the fact that the lack of domain name and firewall-whitelisting prevents you from seeing the service), we should very soon have a situtation where you will be able to harvest NSDs (main) Nesstar-server and extract the proper landing page URL from the oai-pmh(-ddi)-payload. best regards, Ørnulf
cessda-bitbucket-importer commented 5 years ago

Original comment by Moses Mansaray (GitHub: doraVentures).


There is a lot in the above description @‌jws_mo

In summary I understand it to suggest to use: <stdyDscr><method><dataAccs><setAvail><accsPlac> for the study url instead of what CMM suggested which in code currently is:

Can you please confirm? Maybe update the nesstar CMM mapping document and send me a link to it.

Thank you

cessda-bitbucket-importer commented 5 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


Please try extracting studyURL for NESSTAR endpoints from Can you check which endpoints are using that field consistently (the above suggests that UKDS and GESIS don't).

cessda-bitbucket-importer commented 5 years ago

Original comment by Moses Mansaray (GitHub: doraVentures).


Analysis of StudyURL current state for Nesstar records

Source: https://datacatalogue-dev.cessda.eu/es/

I have extracted and converted this to a flattened table in excel for @‌jws_mo to further analyse at will, see attached document. studyurl_checks_result.xlsx

To conclude

The correct CMM xPath at least does not seem to be consistently used and at most not used at all:

cessda-bitbucket-importer commented 5 years ago

Original comment by Moses Mansaray (GitHub: doraVentures).


I will move onto replace the xpath to use <stdyDscr><method><dataAccs><setAvail><accsPlac> and report back the difference/impact.

cessda-bitbucket-importer commented 5 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


Thanks. I'll have a look at the spreadsheet.

cessda-bitbucket-importer commented 5 years ago

Original comment by Moses Mansaray (GitHub: doraVentures).


Code changes done, doing a full re-index locally to further verify impact. @‌jws_mo So far I know from manual checks that:

Progedo http://nesstar.sciences-po.fr/oai-pmh

ADP https://nesstar2.adp.fdv.uni-lj.si/oai-pmh

NSD https://oai-pmh.nsd.no/oai-pmh

Sodanet http://nesstar-server.sodanet.gr/oai-pmh

cessda-bitbucket-importer commented 5 years ago

Original comment by Moses Mansaray (GitHub: doraVentures).


Checks after Code

In brief @‌jws_mo All SPs will need take action to ensure they:

ADP

NSD

Progedo

Sodanet

CSDA: http://nesstar.soc.cas.cz/oai-pmh

Unidata: http://149.132.157.156/oai-pmh

For more see attached spreadsheet “studyurl_checks_result_after_change.xlsx”

cessda-bitbucket-importer commented 5 years ago

Original comment by Moses Mansaray (GitHub: doraVentures).


cessda-bitbucket-importer commented 5 years ago

Original comment by Moses Mansaray (GitHub: doraVentures).


@‌jws_mo PRs:

I will merge these to get them to DEV for checks; sonar and full index clear out and re-harvest and re-ingest . Please feel free to feedback if anything.

cessda-bitbucket-importer commented 5 years ago

Original comment by Moses Mansaray (GitHub: doraVentures).


Hi @‌jws_mo, the coverage issue I mentioned on slack is now a blocker.

https://sonarqube.cessda.eu/component_measures?id=eu.cessda.pasc%3Apasc-osmh-handler-nesstar&metric=new_coverage&view=list

The new code coverage = 0% would be because no coverage is being detected project wide for most of the java applications.

This will now BLOCK all further builds and tickets from being verified on DEV. This would need to be resolved asap.

cessda-bitbucket-importer commented 5 years ago

Original comment by Moses Mansaray (GitHub: doraVentures).


Waiting for sonar to be fixed for code coverage

cessda-bitbucket-importer commented 5 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


Waiting for admin access to SonarQube.

cessda-bitbucket-importer commented 5 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


@‌matthew-morris-cessda Matthew, please have a look at this as a matter urgency. We need these builds to succeed so I can evaluate the changes made by Moses and Ashley.

cessda-bitbucket-importer commented 5 years ago

Original comment by Matthew Morris (GitHub: matthew-morris-cessda).


I fixed the code coverage issue and builds are passing again

cessda-bitbucket-importer commented 5 years ago

Original comment by Moses Mansaray (GitHub: doraVentures).


Nesstar now extracts studyUrl from the @&zwnj;URI attribute of the following element:

+  static final String STUDY_URL_XPATH = "//ddi:codeBook/stdyDscr/dataAccs/setAvail/accsPlac";

Actual above to reflect this path requested here: stdyDscr><method><dataAccs><setAvail><accsPlac>

This code is now in dev and ES has had a fresh re-index.

See comment of analysis after change here: https://github.com/cessda/cessda.pasc.version2/issues/94#comment-51834673

cessda-bitbucket-importer commented 5 years ago

Original comment by Moses Mansaray (GitHub: doraVentures).


Not in expected field. See email correspondence between Ornulf Risnes and John Shepherdson.

cessda-bitbucket-importer commented 5 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


Assigned to Metadata Office. See https://github.com/cessda/cessda.metadata.office/issues/9

cessda-bitbucket-importer commented 5 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


Not in expected field. See email correspondence between Ornulf Risnes and John Shepherdson.

cessda-bitbucket-importer commented 4 years ago

Original comment by Ørnulf Risnes.


Have discussed options with Anne Marie T Laundal (NSD).

Could I ask; where does the harvester currently look for DOI-info?

Is it here?:

cessda-bitbucket-importer commented 4 years ago

Original comment by Taina Jääskeläinen.


Doi info is harvested from

urn:nbn:fi:fsd:FSD3205 It is a repeatable element, so it is good to record the PID type in _agency_ attribute as CESSDA allows only the four \(doi, urn, Handle, ARK\). PID type can differentiate it from other, possibly in-house identifiers, whenever needed.