clarinsi / clarin-dspace

LINDAT/CLARIN digital repository based on DSpace
http://lindat.cz
BSD 3-Clause "New" or "Revised" License
0 stars 2 forks source link

Adding new services to Service tab of CLARIN.SI #30

Closed TomazErjavec closed 5 years ago

TomazErjavec commented 5 years ago

For CLARIN.SI repo I would like to add a new service, i.e. links to our installation of noSketch Engine, in the same manner we already have for KonText. Instructions on how to do it are at https://github.com/ufal/clarin-dspace/wiki/Refbox-services-integration esp. https://github.com/ufal/clarin-dspace/wiki/Refbox-services-integration#adding-new-service.

Currently, links to noSkE are under the "Demo URL" tag (e.g. http://hdl.handle.net/11356/1200), which is ugly. Once we have the option of the noSkE service, they would need to be moved from Demo URL to this service. Hopefully, this can be done automatically or at least the required items that need to be changed can be so identified.

Once we have this sorted, the particular procedure for adding a new service to CLARIN.SI should also be documented in our Wiki, as new services are sure to appear.

cyplas commented 5 years ago

The lindat instructions worked smoothly: https://beta.clarin.si/repository/xmlui/handle/11356/1025.

As for finding the demo urls, I'm not sure yet how to find these automatically, but I'll look into it (perhaps it's time to really try out the REST API).

cyplas commented 5 years ago

Using the API, I generated a list of relevant metadata for all items in the community (attached (as txt, as github doesn't support csv, hmm)).

demos.txt

I see that we have www.clarin.si/noske for some items and nl.ijs.si/noske for others. Are these interchangeable?

I don't see the possibility in the API to delete metadata, only to add or change, but I'll ask lindat.

By the way, can I delete the pmltq service from the registry? I don't see it used for any items, and I'm guessing it's a lindat-specific thing.

TomazErjavec commented 5 years ago

Thanks for this breakthrough, very nice!

I see that we have www.clarin.si/noske for some items and nl.ijs.si/noske for others. Are these interchangeable?

Indeed not! nl.ijs.si is obsolete, and I've been moving corpora (and links) to www.clarin.si. Most of the ones that had the obsolete URLs are old version of resources, which are hidden in CLARIN.SI search (but not hidden in VLO and other aggregators!). For these I've now simply deleted this "Demo" URL, as I don't keep old versions of resources in the concordancers. But you also found some which are current version of the resources but were entered before the concordancer move, and the links were never updated. These I've now corrected by hand, so there should be no link to nl.ijs.si/noske left in the repository, fingers crossed.

I don't see the possibility in the API to delete metadata, only to add or change, but I'll ask lindat.

In the worst case, I can delete - once noSke is available under services - the Demo URLs to noSke by hand.

If possible, the protocol would be:

The mapping between the two services is https://www.clarin.si/kontext/first_form?corpname=XXX https://www.clarin.si/noske/run.cgi/corp_info?corpname=XXX

Btw, it is sad that noske dies with 500 if you ask it about a non-existent corpus, but that is another issue for another project...

cyplas commented 5 years ago

I adapted the script according to the protocol and ran it.. I put the script under /project/lindat-dspace/tmp/ (just for reference, in case we need something similar elsewhere).

But the deletion of demo didn't work: when I tried putting None, it just cleared the value, without deleting the field. So these will have to be fixed manually (I did the first one: 11356/1170). I'm attaching the script log, where the relevant handles are listed. demo_to_service.txt

As for local documentation, I don't think it's needed, as the clarin-dspace instructions were sufficient.

cyplas commented 5 years ago

I should add: the script ran successfully. :) And all the URLs passed basic checks: only HTTP 200 for noske urls and no kontext responses containing "ValueError: Missing configuration data for ".

TomazErjavec commented 5 years ago

So these will have to be fixed manually My fingers hurt, but it's done!

One detail: "noSketchEngine" is a very long string, is it possible to shorten it everywhere to "noSketch"? If it is difficult, let's forget it till needed (when we will have many many more services:), and just close.

And, to set a first:

I should add: the script ran successfully. :) And all the URLs passed basic checks: only HTTP 200 for noske urls and no kontext responses containing "ValueError: Missing configuration data for ".

I figured as much, thanks!b

cyplas commented 5 years ago

One detail: "noSketchEngine" is a very long string, is it possible to shorten it everywhere to "noSketch"?

Done (easy, as it's just a variable setting in lr.cfg).