clarin-eric / LRSwitchboard

DEPRECATED - Please see https://github.com/clarin-eric/switchboard for latest version - Code Repository for the Language Resources Switchboard of CLARIN
Other
1 stars 0 forks source link

File download feature does not seem to work for PDF files. #28

Closed andmor- closed 6 years ago

andmor- commented 6 years ago

If I ask the LRS to download a PDF file via its "Paste link from Dropbox or B2DROP..." input box and look at the downloaded PDF file. I see an empty PDF despite the expected non-zero file size.

To reproduce this, just curl or browser access the following URL on your local deployment:

http://localhost:8089/clrs-dev/download?input=https%3A%2F%2Foffice.clarin.eu%2Fpp%2FD8S-2.2.pdf

The download works fine for TXT files, e.g.:

http://localhost:8089/clrs-dev/download?input=https%3A%2F%2Fraw.githubusercontent.com%2Fclarin-eric%2FVLO%2Fvlo-4.3.4%2FUPGRADE.txt

The problem seems to be related with the way the response is built on main.py#L37

claus-zinn commented 6 years ago

The bug should be fixed, and now arbitrary URLs should work. Multiple parts of the switchboard code needed to be changed to repair this, and in part, the Python code was indeed to blame. Note that the switchboard now downloads the resource from the shared link AND uploads the resource subsequently to switchboard's storage. This avoids CORS-related issues for tools connected to the switchboard.