Closed jakoble closed 5 years ago
Isn't a real problem of this record that it seems to be replaced by a newer record? http://hdl.handle.net/10032/60195d484591490edf259ec156ad7aa9
If it is really a duplicate now, it should be hidden and linked. We are happy to do it, if the Instituut voor de Nederlandse Taal sends us a request, confirming that indeed it is the same data and our old record is outdated now.
I had a mail discussion about this a couple of months ago through the CLARIN-D helpdesk (Ticket #2018032110000028 in case that helps) and would appreciate if the LRT entry gets deprecated. It's not only the license that's a problem there, the project URL leads to a 404 and there's no pointer to the actual data.
@hannahedeland has been in contact with INL regarding their records and how they appear in the VLO. Perhaps she could follow up on this and request (requesting) removal from the LRT inventory?
OK, so what should we do now? As I said, we are happy to point the record to a newer one. But I see a small problem currently:
The newer CMDI record doesn't have a human readable version, at least not in my browser. See the handle above. There is a landing page, but the PID doesn't show it, and it is only in Dutch (or Flemish). I don't mean to nit-pick, but ideally our obsolete record should be replaced by one that conforms to B-centre requirements.
@stranak valid point, I think. But is it the LRT inventory's task to enforce B-centre requirements? INL has B-centre status so we should (be able to) assume that they will fix this sooner or later. Anyway they have been quite responsive and open to suggestions regarding metadata improvements so maybe we can make this suggestion part of the follow-up.
Is a human-readable version of a CMDI (or a landing page with the PID) a B Centre requirement? I think it should be, but I noticed it's not done by all centres. I thought that some just use the VLO as their human-readable front-end, and then it might be okay that the "open in original context" link leads to the "download" page (which seems to require a non-shibbolized login).
Is a human-readable version of a CMDI (or a landing page with the PID) a B Centre requirement?
My interpretation of the B-centre Checklist is yes
. I base my answer on this section of the checklist:
CLARIN requires that URLs to which metadata PIDs point support the HTTP-accept header (“content negotiation”) with minimally the following mime types: • text/html (web-browser, human readable) • application/x-cmdi+xml (CMDI metadata, for machine interpretation)
So when I click the PID in a browser, I should get the human-readable version.
I am not on the assessment committee, I might be missing something, but I can say that was the intent of this rule, when it was included. It wasn't there in the beginning, but it has been added specifically so that users could always see readable homepages when browsing.
English version, on the other hand, is not a rule. It is getting close to becoming a recommendation, from what I see in the curation TF. But it is not a rule.
@twagoo uhm end of that story from the Helpdesk perspective was actually you taking over for the removal part ;)
(This duplicate problem also appears with the "normal" SoNaR Corpus (http://hdl.handle.net/10032/87863238c022f388d6b3b5fab0c56fe5), for which there is also additionally a CollBank record, but maybe this is not part of the Resource Families...)
@hannahedeland I suppose you're right 😬 Ok I will see if I can pick up that thread again then...
I've been in contact with Griet from INT (sorry for incorrectly referring to them as INL before) and got a status update: new CMDI records with 'full PID information' are available as of today. Once they have been harvested (by the end of the week), they will contact you (LINDAT) and send a full list of duplicate records that can be replaced by a reference. I guess any concerns regarding metadata quality/B-centre compliance can be followed up in that context. Shall we leave this issue open until the LRT record has actually been deprecated?
I'd leave the issue open for now. I'll try to remember to close it once we have updated the records and directed them to the new INT records :-)
For the record, INT has now informed me that they contacted LINDAT with a request and the required details to remove duplicates.
AUTONOMATA POI Corpus http://hdl.handle.net/11372/LRT-1501 AUTONOMATA-POI-corpus http://hdl.handle.net/10032/c659b1fcf7e27d682dca1d5df67aab83
AUTONOMATA Spoken Names Corpus http://hdl.handle.net/11372/LRT-1499 AUTONOMATA-namencorpus http://hdl.handle.net/10032/c246dcdd5c0205c89723093f6d98ee2f
AUTONOMATA-g2p-toolkit http://hdl.handle.net/11372/LRT-1327 AUTONOMATA-transcriptietoolset http://hdl.handle.net/10032/d9dbcb375da934aab858256c2b3ac825
BasiLex Corpus http://hdl.handle.net/11372/LRT-1470 Basilex-corpus http://hdl.handle.net/10032/5a5e7fb0e379a257f632dfdae274ce4f
BasiLex Lexicon http://hdl.handle.net/11372/LRT-1471 Basilex-Lexicon http://hdl.handle.net/10032/d5957166e9127373a50116da71b54a9e
Children's Oral Reading Corpus (CHOREC) http://hdl.handle.net/11372/LRT-1497 Children's Oral Reading Corpus (CHOREC) http://hdl.handle.net/10032/49300ffe90311e0de3599fd1cb4a0e4c
CombiLex http://hdl.handle.net/11372/LRT-1496 CombiLex http://hdl.handle.net/10032/5dbfa8a0e4bb6ade7636cce4a4285989
COREA Coreference Corpus http://hdl.handle.net/11372/LRT-1503 COREA-coreferentiecorpus http://hdl.handle.net/10032/198523b75d5f96dd89b7ef36a2805344
Corpus of Pathological and Normal Speech (COPAS) http://hdl.handle.net/11372/LRT-1486 Corpus Pathologische en Normale Spraak (COPAS) http://hdl.handle.net/10032/230d4ababde9f748c6b6a723650a50b2
DAESO Corpus: Parallel Dutch Monolingual Treebank http://hdl.handle.net/11372/LRT-1487 DAESO-corpus: Parallelle Nederlandstalige monolinguale treebank http://hdl.handle.net/10032/ba8a5c5e922943ac06638fb18daf29bb
D-TUNA Corpus http://hdl.handle.net/11372/LRT-1500 D-TUNA-corpus http://hdl.handle.net/10032/196866543c4dd7627f8015faadfbfbed
DuELME http://hdl.handle.net/11372/LRT-1494 DuELME http://hdl.handle.net/10032/f7a60bf762c8ff03ef2aa255a4d0a866
Eindhoven Corpus http://hdl.handle.net/11372/LRT-574 Eindhoven-corpus http://hdl.handle.net/10032/65b84aa72398605c8d2480ae31b5df9f
eLex http://hdl.handle.net/11372/LRT-1491 e-Lex http://hdl.handle.net/10032/67bb70a6cfea00fb98c10b9d426a9e5e
Frequency lists corpora http://hdl.handle.net/11372/LRT-1495 Frequentielijsten Corpora http://hdl.handle.net/10032/c61f12b28bddab0c8c6d0ba6245aa79a
Frequency lists of various corpora http://hdl.handle.net/11372/LRT-578 Frequentielijsten Corpora http://hdl.handle.net/10032/c61f12b28bddab0c8c6d0ba6245aa79a
Jasmin Speech Corpus http://hdl.handle.net/11372/LRT-1492 JASMIN-spraakcorpus http://hdl.handle.net/10032/3a677b6f6aed03519ab6fc6a460b2908
Lassy Large Corpus http://hdl.handle.net/11372/LRT-1506 Lassy Groot-corpus http://hdl.handle.net/10032/0ab316b597c9ecbbb1da545e94634ae4
Lassy Small Corpus http://hdl.handle.net/11372/LRT-1493 Lassy Klein-corpus http://hdl.handle.net/10032/efc201791fadf20f67858b602553874b
Multilingual Subtitle Data 2BDutch http://hdl.handle.net/11372/LRT-1490 Meertalige Ondertiteldata 2BDutch http://hdl.handle.net/10032/7c9f0e340e7d9499418b2cd42313fd36
Reference Lexicon for Belgian-Dutch http://hdl.handle.net/11372/LRT-1504 Referentiebestand Belgisch-Nederlands (RBBN) http://hdl.handle.net/10032/4f94802559b6eb54c42680c23796fd8a
Reference Lexicon for Dutch Small (RBN-klein) http://hdl.handle.net/11372/LRT-1505 RBN-klein http://hdl.handle.net/10032/a674b5e9e41804f78c8e6b5ebef49866
Referentiebestand Nederlands http://hdl.handle.net/11372/LRT-581 Referentiebestand Nederlands (RBN) http://hdl.handle.net/10032/d87afa489e65291fefb7f03c40981bb4
SoNaR Corpus http://hdl.handle.net/11372/LRT-1498 SoNaR-corpus http://hdl.handle.net/10032/78642d04df0d21dbbd3a805a20f947a7
SoNaR New Media Corpus http://hdl.handle.net/11372/LRT-1502 SoNaR Nieuwe Media-corpus http://hdl.handle.net/10032/157d6fee6134f5beab09b159dd7c710a
SumNL Summaries Corpus http://hdl.handle.net/11372/LRT-1489 SumNL-samenvattingencorpus http://hdl.handle.net/10032/b4fc4e6894b961e713f2207b62d50928
VMNW http://hdl.handle.net/11372/LRT-520 VMNW http://hdl.handle.net/10032/d87c40046e3637a5ede04db1b88ec84f
Woordenboek der Nederlandse Taal http://hdl.handle.net/11372/LRT-583 iWNT http://hdl.handle.net/10032/14fa7daae9014e45f44624ea680452b7
These records are now withdrawn
I guess the issue can be closed
The withdrawn records' tombstone pages now correctly point to new PIDs and those correctly resolve to human-readable pages at IDVNT.
https://hdl.handle.net/11372/LRT-1502