Closed cessda-bitbucket-importer closed 5 years ago
Original comment by John Shepherdson (GitHub: john-shepherdson).
Hi John,
I believe I have been able to confuse us both in several different ways during this discussion about constructing a synthetic study URL to Nesstar held studies.
The main cause of confusion can probably be clarified by the following statement:
If landing page URLs (e.g. synthetic Nesstar WebView-urls) are part of the DDI payload already, no modifications to the OAI-PMH-Nesstar front are required.
The payload is already there.
We have now set up a preliminary OAI-PMH-front for NSD's main survey repository (http://nsddata.nsd.uib.no).
The OAI-PMH-front-service will get a domain name (likely oai-pmh.nsd.no), but for now it has an IP-address only.
NB! Right now you won't be able to access this URL due to University firewalls. I will address this.
But attached you will find the OAI-PMH-xml response payload for that particular OAI-PMH-request. I also attach a screenshot from my browser as evidence that I'm not making this up.
Now - in the attached xml response, you will find the URI to the landing page for this particular study under:
stdyDscr/method/dataAccs/setAvail/accsPlac
...approximately on line 223 in the xml-document when viewed in a text editor:
To sum up: The good:
Using verb=GetRecord and metadataPrefix=oai_ddi will give the harvesters utilising this OAI-PMH-servers the content we're after.
No modifications are needed to the Nesstar-OAI-PMH-server to achieve this
The bad:
This means that CESSDA archives with Nesstar Servers, and that have been
following our practice by using the
It is however unclear how many archives this "convenience" will benefit.
Original comment by Moses Mansaray (GitHub: doraVentures).
There is a lot in the above description @jws_mo
In summary I understand it to suggest to use:
<stdyDscr><method><dataAccs><setAvail><accsPlac>
for the study url instead of what CMM suggested which in code currently is:
Can you please confirm? Maybe update the nesstar CMM mapping document and send me a link to it.
Thank you
Original comment by John Shepherdson (GitHub: john-shepherdson).
Please try extracting studyURL for NESSTAR endpoints from
Original comment by Moses Mansaray (GitHub: doraVentures).
Source: https://datacatalogue-dev.cessda.eu/es/
StudyUrl
is: http://www.adp.fdv.uni-lj.si/opisi/StudyUrl
StudyUrl
StudyUrl
seems to be unique and takes me to the records landing page, an instance of junk entry, the rest of the records are missing studyUrl from the current cmm expected XpathI have extracted and converted this to a flattened table in excel for @jws_mo to further analyse at will, see attached document. studyurl_checks_result.xlsx
The correct CMM xPath at least does not seem to be consistently used and at most not used at all:
Original comment by Moses Mansaray (GitHub: doraVentures).
I will move onto replace the xpath to use <stdyDscr><method><dataAccs><setAvail><accsPlac>
and report back the difference/impact.
Original comment by John Shepherdson (GitHub: john-shepherdson).
Thanks. I'll have a look at the spreadsheet.
Original comment by Moses Mansaray (GitHub: doraVentures).
Code changes done, doing a full re-index locally to further verify impact. @jws_mo So far I know from manual checks that:
Progedo http://nesstar.sciences-po.fr/oai-pmh
ADP https://nesstar2.adp.fdv.uni-lj.si/oai-pmh
NSD https://oai-pmh.nsd.no/oai-pmh
Sodanet http://nesstar-server.sodanet.gr/oai-pmh
Original comment by Moses Mansaray (GitHub: doraVentures).
Checks after Code
In brief @jws_mo All SPs will need take action to ensure they:
<stdyDscr><method><dataAccs><setAvail><accsPlac>
URI
attribute in the above elementURI
should be that for the record; a Unique Resource IdentifierADP
<accsPlac>
does not contain a unique identifier to a record example:NSD
<accsPlac>
that do contain a unique identifier to a recordError 404
Progedo
<accsPlac>
for all Records<accsPlac>
does not contain a unique identifier to a Record. Example repeated URI
:
Sodanet
<accsPlac>
for the many records I triedCSDA: http://nesstar.soc.cas.cz/oai-pmh
Unidata: http://149.132.157.156/oai-pmh
For more see attached spreadsheet “studyurl_checks_result_after_change.xlsx”
Original comment by Moses Mansaray (GitHub: doraVentures).
Original comment by Moses Mansaray (GitHub: doraVentures).
@jws_mo PRs:
I will merge these to get them to DEV for checks; sonar and full index clear out and re-harvest and re-ingest . Please feel free to feedback if anything.
Original comment by Moses Mansaray (GitHub: doraVentures).
Hi @jws_mo, the coverage issue I mentioned on slack is now a blocker.
The new code coverage = 0% would be because no coverage is being detected project wide for most of the java applications.
This will now BLOCK all further builds and tickets from being verified on DEV. This would need to be resolved asap.
Original comment by Moses Mansaray (GitHub: doraVentures).
Waiting for sonar to be fixed for code coverage
Original comment by John Shepherdson (GitHub: john-shepherdson).
Waiting for admin access to SonarQube.
Original comment by John Shepherdson (GitHub: john-shepherdson).
@matthew-morris-cessda Matthew, please have a look at this as a matter urgency. We need these builds to succeed so I can evaluate the changes made by Moses and Ashley.
Original comment by Matthew Morris (GitHub: matthew-morris-cessda).
I fixed the code coverage issue and builds are passing again
Original comment by Moses Mansaray (GitHub: doraVentures).
Nesstar now extracts studyUrl from the @‌URI
attribute of the following element:
+ static final String STUDY_URL_XPATH = "//ddi:codeBook/stdyDscr/dataAccs/setAvail/accsPlac";
Actual above to reflect this path requested here: stdyDscr><method><dataAccs><setAvail><accsPlac>
This code is now in dev and ES has had a fresh re-index.
See comment of analysis after change here: https://github.com/cessda/cessda.pasc.version2/issues/94#comment-51834673
Original comment by Moses Mansaray (GitHub: doraVentures).
Not in expected field. See email correspondence between Ornulf Risnes and John Shepherdson.
Original comment by John Shepherdson (GitHub: john-shepherdson).
Assigned to Metadata Office. See https://github.com/cessda/cessda.metadata.office/issues/9
Original comment by John Shepherdson (GitHub: john-shepherdson).
Not in expected field. See email correspondence between Ornulf Risnes and John Shepherdson.
Original comment by Ørnulf Risnes.
Have discussed options with Anne Marie T Laundal (NSD).
Could I ask; where does the harvester currently look for DOI-info?
Is it here?:
Original comment by Taina Jääskeläinen.
Doi info is harvested from
Original report on BitBucket by John Shepherdson (GitHub: john-shepherdson).
Not in expected field. See email correspondence between Ornulf Risnes and John Shepherdson.