clarin-eric / LRSwitchboard

DEPRECATED - Please see https://github.com/clarin-eric/switchboard for latest version - Code Repository for the Language Resources Switchboard of CLARIN
Other
1 stars 0 forks source link

Pass on the resolved handle to receiving tools #20

Closed dietervu closed 5 years ago

dietervu commented 6 years ago

If a handle is used as input (eg https://hdl.handle.net/1839/00-0000-0000-0021-5236-1) the receiving applications seem to parse the redirect page from hdl.handle.net and not the resource behind this.

claus-zinn commented 6 years ago

The behaviour of the receiving applications varies. Some applications can handle handles, others can not. The switchboard could do the following: it resolves the handle itself, downloads the resource (optionally doing the identification of media type and language), and then uploads the resource to the switchboard's fileserver. The receiving applications will be able to access the resource from there (similar to the switchboard standalone version). I will implement this scenario and make it available on the switchboard development version asap.

twagoo commented 6 years ago

Sounds like an ok workaround. But if we know that an application can deal with handles (which would have to be represented in the metadata), or any other kind of PID, it would be best to just pass the URI right?

claus-zinn commented 6 years ago

Good point, WebLicht for instance seems to be able to handle handles. And for efficiency reasons, it might make sense to only apply the aforementioned download/upload procedure in cases where the receiving applications are known to have problems with handles. Extending the tool metadata in this respect makes sense!

claus-zinn commented 6 years ago

However, thinking of it, the solution works against switchboard design: first, a resource is being given to the switchboard (with possible id of mime-type/language), then applicable tools are identified. Now, once applicable tools have been identified, and a tool has been selected, I need to check whether the tool has handle support, if not, I've to touch the resource again (in a sense, going back to stage 1). Well, I think about it more.

claus-zinn commented 6 years ago

Seems that the Python package http://docs.python-requests.org/en/master/ gives me access to the redirect history, and that I can set allow_redirects=False, so that: response = requests.get(url, allow_redirects=False) yields a # <Response [303]> object, with response.headers['Location'] giving me the redirect location. So, I have to revise my aforementioned statements a little as there is not necessarily a download/upload action involved in the process.

claus-zinn commented 5 years ago

For the time being, the following process is implemented & live in the switchboard: the switchboard downloads the resource behind ANY url (including handles). It then uploads the resource to the nextcloud-based file storage server, and creates a shared link for the resource. This shared link is given to tools to process the resource. That is, they won't see the original URLs. I close the issue.

twagoo commented 5 years ago

I still think that there could be tools that benefit from getting the handle. For example to retrieve handle metadata or simply to identify a resource that was processed before. Or there may be a local cached version of the resource that could be processed instead but the handle would be needed to identify it.

However no concrete cases that I know of, so the ticket can stay closed but something to keep in mind.

twagoo commented 5 years ago

Also, for provenance records having the original handle might be highly desirable. Perhaps it could be passed in a 'side channel' (for example a dedicated query parameter that could optionally be registered for tools).