clarin-eric / metadata-curation

collecting metadata issues
0 stars 0 forks source link

MMSH harvested records containing broken links #1

Open dietervu opened 1 year ago

dietervu commented 1 year ago

Original report:

the links in the metadata provided via your OAI-PMH endpoint, eg https://vlo.clarin.eu/record/oai_58_phonotheque.mmsh.huma-num.fr_58_dc8700?q=anniversaire&fqType=collection:or&fq=collection:COllections+de+COrpus+Oraux+Numeriques+%28CoCoON+ex-CRDO%29&fq=collection:OAI+de+la+MMSH returns

http://phonotheque.mmsh.huma-num.fr/dyn/portal/index.seam?page=alo&aloId=8700

and then a 404 error page.

Answer Véronique from MMSH:

Usually the OAI-PMH endpoint of our database didn't change but the software is not stable. We are currently transferring the database to Calames where we have an OAIPMH set

http://www.calames.abes.fr/oai/oai2.aspx?verb=ListSets

130019801 AIX-EN-PROVENCE. Maison méditerranéenne des sciences de l'homme

All our records in our DC set:

http://www.calames.abes.fr/oai/oai2.aspx?verb=ListRecords&metadataPrefix=oai_dc&set=130019801

do you think you could use this set?

Follow-up: CLARIN successfully harvested the new MMSH endpoint but the DC records do not contain any resolvable identifier (like a URL) and therefore are skipped during the VLO import.

Answer Véronique MMSH on 2023-09-25:

I will ask ABES to add the URL in their DC

Follow-up 2023-10-17:

Dear Twan, dear Dieter ABES tell us that you have to build yourself the url it is explained here https://documentation.abes.fr/aidecalames/manuelinformaticien/index.html#SpecificitesTechniques See the last paragraph To build the URL of the Calames record from the Identifer (Pour reconstituer l'URL de la description des unités documentaires dans Calames à partir de identifier) The root of the URL is www.calames.abes.fr/pub/ms/

You have to add the identifier in oai:oaicalames.abes.fr:Calames-2020422164313484

And build the url as: http://www.calames.abes.fr/pub/#details?id=Calames-2020422164313484 I hope this is helpfull. Thanks and all the best, Véronique

Véronique Ginouvès AMU-CNRS UAR3125 Maison méditerranéenne des sciences de l'homme Secteur Archives de la recherche - Médiathèque MMSH https://phonotheque.hypotheses.org/32107 5 rue du château de l'horloge - CS 90412 13097 Aix-en-Provence Cedex 2 Tél : 00-33(0)442524113

Le mer. 27 sept. 2023 à 10:57, Twan Goosen [twan@clarin.eu](mailto:twan@clarin.eu) a écrit :

Dear Véronique, all,

Our processing pipeline does not support EAD so adding a resolvable
identifier to the DC representation would be a great solution. Please
keep us posted :)

Thanks and all the best,
Twan

On 25-09-2023 09:00, Véronique Ginouvès wrote:
> Dear Dieter
> I will ask ABES to add the  in their DC, but if you can use EAD maybe
> with this expression
>
> http://www.calames.abes.fr/oai/oai2.aspx?verb=ListRecords&metadataPrefix=oai_ead&set=130019801
> <http://www.calames.abes.fr/oai/oai2.aspx?verb=ListRecords&metadataPrefix=oai_ead&set=130019801>
>
> you found the URL with the record.
> Is it better ?
> Have a nice week
> Véronique

-- 
Twan Goosen
Software developer at CLARIN ERIC
[www.clarin.eu](http://www.clarin.eu/) | [twan@clarin.eu](mailto:twan@clarin.eu)
twagoo commented 10 months ago

Status update

dietervu commented 3 months ago

The issue of the missing URL in the identifier seems to be solved, eg https://vlo.clarin.eu/data/clarin/oai-pmh/Calames_OAI_Serveur_ABES_/Calames_OAI_Serveur_ABES_0000027.xml#oai:oaicalames.abes.fr:Calames-2018328143485243

@twagoo is this observation correct, or did we do something to fix this during the harvest? (If so, do you have a reference? I cannot see anything in https://github.com/clarin-eric/oai-harvest-config/blob/master/config-clarin-clarin.xml)

twagoo commented 3 months ago

@twagoo is this observation correct, or did we do something to fix this during the harvest? (If so, do you have a reference? I cannot see anything in https://github.com/clarin-eric/oai-harvest-config/blob/master/config-clarin-clarin.xml)

To my knowledge we have not rolled out a fix, so they must have fixed it themselves.