clarin-eric / resource-families-issues

4 stars 0 forks source link

SoNaR New Media #14

Closed jakoble closed 5 years ago

jakoble commented 6 years ago

https://hdl.handle.net/11372/LRT-1502

stranak commented 6 years ago

Isn't a real problem of this record that it seems to be replaced by a newer record? http://hdl.handle.net/10032/60195d484591490edf259ec156ad7aa9

If it is really a duplicate now, it should be hidden and linked. We are happy to do it, if the Instituut voor de Nederlandse Taal sends us a request, confirming that indeed it is the same data and our old record is outdated now.

kreetrapper commented 6 years ago

I had a mail discussion about this a couple of months ago through the CLARIN-D helpdesk (Ticket #2018032110000028 in case that helps) and would appreciate if the LRT entry gets deprecated. It's not only the license that's a problem there, the project URL leads to a 404 and there's no pointer to the actual data.

twagoo commented 6 years ago

@hannahedeland has been in contact with INL regarding their records and how they appear in the VLO. Perhaps she could follow up on this and request (requesting) removal from the LRT inventory?

stranak commented 6 years ago

OK, so what should we do now? As I said, we are happy to point the record to a newer one. But I see a small problem currently:

The newer CMDI record doesn't have a human readable version, at least not in my browser. See the handle above. There is a landing page, but the PID doesn't show it, and it is only in Dutch (or Flemish). I don't mean to nit-pick, but ideally our obsolete record should be replaced by one that conforms to B-centre requirements.

twagoo commented 6 years ago

@stranak valid point, I think. But is it the LRT inventory's task to enforce B-centre requirements? INL has B-centre status so we should (be able to) assume that they will fix this sooner or later. Anyway they have been quite responsive and open to suggestions regarding metadata improvements so maybe we can make this suggestion part of the follow-up.

kreetrapper commented 6 years ago

Is a human-readable version of a CMDI (or a landing page with the PID) a B Centre requirement? I think it should be, but I noticed it's not done by all centres. I thought that some just use the VLO as their human-readable front-end, and then it might be okay that the "open in original context" link leads to the "download" page (which seems to require a non-shibbolized login).

stranak commented 6 years ago

Is a human-readable version of a CMDI (or a landing page with the PID) a B Centre requirement?

My interpretation of the B-centre Checklist is yes. I base my answer on this section of the checklist:

CLARIN requires that URLs to which metadata PIDs point support the HTTP-accept header (“content negotiation”) with minimally the following mime types: • text/html (web-browser, human readable) • application/x-cmdi+xml (CMDI metadata, for machine interpretation)

So when I click the PID in a browser, I should get the human-readable version.

I am not on the assessment committee, I might be missing something, but I can say that was the intent of this rule, when it was included. It wasn't there in the beginning, but it has been added specifically so that users could always see readable homepages when browsing.

English version, on the other hand, is not a rule. It is getting close to becoming a recommendation, from what I see in the curation TF. But it is not a rule.

hannahedeland commented 6 years ago

@twagoo uhm end of that story from the Helpdesk perspective was actually you taking over for the removal part ;)

(This duplicate problem also appears with the "normal" SoNaR Corpus (http://hdl.handle.net/10032/87863238c022f388d6b3b5fab0c56fe5), for which there is also additionally a CollBank record, but maybe this is not part of the Resource Families...)

twagoo commented 6 years ago

@hannahedeland I suppose you're right 😬 Ok I will see if I can pick up that thread again then...

twagoo commented 6 years ago

I've been in contact with Griet from INT (sorry for incorrectly referring to them as INL before) and got a status update: new CMDI records with 'full PID information' are available as of today. Once they have been harvested (by the end of the week), they will contact you (LINDAT) and send a full list of duplicate records that can be replaced by a reference. I guess any concerns regarding metadata quality/B-centre compliance can be followed up in that context. Shall we leave this issue open until the LRT record has actually been deprecated?

stranak commented 6 years ago

I'd leave the issue open for now. I'll try to remember to close it once we have updated the records and directed them to the new INT records :-)

twagoo commented 6 years ago

For the record, INT has now informed me that they contacted LINDAT with a request and the required details to remove duplicates.

kosarko commented 5 years ago
AUTONOMATA POI Corpus   http://hdl.handle.net/11372/LRT-1501    AUTONOMATA-POI-corpus   http://hdl.handle.net/10032/c659b1fcf7e27d682dca1d5df67aab83
AUTONOMATA Spoken Names Corpus  http://hdl.handle.net/11372/LRT-1499    AUTONOMATA-namencorpus  http://hdl.handle.net/10032/c246dcdd5c0205c89723093f6d98ee2f
AUTONOMATA-g2p-toolkit  http://hdl.handle.net/11372/LRT-1327    AUTONOMATA-transcriptietoolset  http://hdl.handle.net/10032/d9dbcb375da934aab858256c2b3ac825 
BasiLex Corpus  http://hdl.handle.net/11372/LRT-1470    Basilex-corpus  http://hdl.handle.net/10032/5a5e7fb0e379a257f632dfdae274ce4f
BasiLex Lexicon http://hdl.handle.net/11372/LRT-1471    Basilex-Lexicon http://hdl.handle.net/10032/d5957166e9127373a50116da71b54a9e
Children's Oral Reading Corpus (CHOREC) http://hdl.handle.net/11372/LRT-1497    Children's Oral Reading Corpus (CHOREC) http://hdl.handle.net/10032/49300ffe90311e0de3599fd1cb4a0e4c
CombiLex    http://hdl.handle.net/11372/LRT-1496    CombiLex    http://hdl.handle.net/10032/5dbfa8a0e4bb6ade7636cce4a4285989
COREA Coreference Corpus    http://hdl.handle.net/11372/LRT-1503    COREA-coreferentiecorpus    http://hdl.handle.net/10032/198523b75d5f96dd89b7ef36a2805344
Corpus of Pathological and Normal Speech (COPAS)    http://hdl.handle.net/11372/LRT-1486    Corpus Pathologische en Normale Spraak (COPAS)  http://hdl.handle.net/10032/230d4ababde9f748c6b6a723650a50b2
DAESO Corpus: Parallel Dutch Monolingual Treebank   http://hdl.handle.net/11372/LRT-1487    DAESO-corpus: Parallelle Nederlandstalige monolinguale treebank http://hdl.handle.net/10032/ba8a5c5e922943ac06638fb18daf29bb
D-TUNA Corpus   http://hdl.handle.net/11372/LRT-1500    D-TUNA-corpus   http://hdl.handle.net/10032/196866543c4dd7627f8015faadfbfbed
DuELME  http://hdl.handle.net/11372/LRT-1494    DuELME  http://hdl.handle.net/10032/f7a60bf762c8ff03ef2aa255a4d0a866
Eindhoven Corpus    http://hdl.handle.net/11372/LRT-574 Eindhoven-corpus    http://hdl.handle.net/10032/65b84aa72398605c8d2480ae31b5df9f
eLex    http://hdl.handle.net/11372/LRT-1491    e-Lex   http://hdl.handle.net/10032/67bb70a6cfea00fb98c10b9d426a9e5e
Frequency lists corpora http://hdl.handle.net/11372/LRT-1495    Frequentielijsten Corpora   http://hdl.handle.net/10032/c61f12b28bddab0c8c6d0ba6245aa79a
Frequency lists of various corpora  http://hdl.handle.net/11372/LRT-578 Frequentielijsten Corpora   http://hdl.handle.net/10032/c61f12b28bddab0c8c6d0ba6245aa79a
Jasmin Speech Corpus    http://hdl.handle.net/11372/LRT-1492    JASMIN-spraakcorpus http://hdl.handle.net/10032/3a677b6f6aed03519ab6fc6a460b2908
Lassy Large Corpus  http://hdl.handle.net/11372/LRT-1506    Lassy Groot-corpus  http://hdl.handle.net/10032/0ab316b597c9ecbbb1da545e94634ae4
Lassy Small Corpus  http://hdl.handle.net/11372/LRT-1493    Lassy Klein-corpus  http://hdl.handle.net/10032/efc201791fadf20f67858b602553874b
Multilingual Subtitle Data 2BDutch  http://hdl.handle.net/11372/LRT-1490    Meertalige Ondertiteldata 2BDutch   http://hdl.handle.net/10032/7c9f0e340e7d9499418b2cd42313fd36
Reference Lexicon for Belgian-Dutch http://hdl.handle.net/11372/LRT-1504    Referentiebestand Belgisch-Nederlands (RBBN)    http://hdl.handle.net/10032/4f94802559b6eb54c42680c23796fd8a
Reference Lexicon for Dutch Small (RBN-klein)   http://hdl.handle.net/11372/LRT-1505    RBN-klein   http://hdl.handle.net/10032/a674b5e9e41804f78c8e6b5ebef49866
Referentiebestand Nederlands    http://hdl.handle.net/11372/LRT-581 Referentiebestand Nederlands (RBN)  http://hdl.handle.net/10032/d87afa489e65291fefb7f03c40981bb4
SoNaR Corpus    http://hdl.handle.net/11372/LRT-1498    SoNaR-corpus    http://hdl.handle.net/10032/78642d04df0d21dbbd3a805a20f947a7
SoNaR New Media Corpus  http://hdl.handle.net/11372/LRT-1502    SoNaR Nieuwe Media-corpus   http://hdl.handle.net/10032/157d6fee6134f5beab09b159dd7c710a
SumNL Summaries Corpus  http://hdl.handle.net/11372/LRT-1489    SumNL-samenvattingencorpus  http://hdl.handle.net/10032/b4fc4e6894b961e713f2207b62d50928
VMNW    http://hdl.handle.net/11372/LRT-520 VMNW    http://hdl.handle.net/10032/d87c40046e3637a5ede04db1b88ec84f
Woordenboek der Nederlandse Taal    http://hdl.handle.net/11372/LRT-583 iWNT    http://hdl.handle.net/10032/14fa7daae9014e45f44624ea680452b7

These records are now withdrawn

kosarko commented 5 years ago

I guess the issue can be closed

stranak commented 5 years ago

The withdrawn records' tombstone pages now correctly point to new PIDs and those correctly resolve to human-readable pages at IDVNT.