EUDAT-B2STAGE / http-api

RESTful HTTP-API for the B2STAGE service inside the EUDAT project
https://eudat-b2stage.github.io/http-api/
MIT License
7 stars 7 forks source link

Download by PID does not work #86

Closed chStaiger closed 7 years ago

chStaiger commented 7 years ago

I can download a file via the registered endpoint, but not via the pids endpoint:

curl -o test.pdf -H "Authorization: Bearer $TOKEN" \
http://$SERVER:$PORT/api/registered/tempZone/home/guest/b2safe/file.pdf?download=true
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 24826    0 24826    0     0   330k      0 --:--:-- --:--:-- --:--:--  332k

--> works fine

The same file carries a PID:

curl -H "Authorization: Bearer $TOKEN"  \
http://$SERVER:$PORT/api/registered/tempZone/home/guest/b2safe/file.pdf
{
  "Meta": {
    ...
    *snip*
     ...
  },
  "Response": {
    "data": [
      {
        "file.pdf": {
          "dataobject": "file.pdf",
          "link": "http://localhost/api/registered/tempZone/home/guest/b2safe/file.pdf",
          "location": "irods://rodserver.dockerized.io/tempZone/home/guest/b2safe",
          "metadata": {
            ...
            "PID": "21.T12995/223fbf24-752f-11e7-84c7-0242ac010003",
            "checksum": "sha2:dA4+fgTh7nLE/x+LUbBGKDqBKjp4mJiVI1Jz6l6MsaM=",
            ....
          },
          "path": "/tempZone/home/guest/b2safe"
        }
      }
    ],
    "errors": null
  }
}

The download with option -o creates a new file but no content:

curl -o newFile.pdf -H "Authorization: Bearer $TOKEN" \
 http://$SERVER:$PORT/api/pids/21.T12995/223fbf24-752f-11e7-84c7-0242ac010003?download
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   329  100   329    0     0    553      0 --:--:-- --:--:-- --:--:--   552

without the option -o:

curl -H "Authorization: Bearer $TOKEN" \
http://$SERVER:$PORT/api/pids/21.T12995/223fbf24-752f-11e7-84c7-0242ac010003?download=true
{
  "Meta": {
    "data_type": "<class 'dict'>",
    "elements": 1,
    "errors": 1,
    "status": 300
  },
  "Response": {
    "data": {
      "URL": "irods://145.100.58.6:1247/tempZone/home/guest/b2safe/file.pdf"
    },
    "errors": [
      "Data-object can't be downloaded by current HTTP-API server 'localhost'"
    ]
  }
}
chStaiger commented 7 years ago

It might have something to do how the URL in the PIDs is represented and what the B2SATGE HTTP API expects. And there might also be some confusion with what is the FQDN for a dockerised iRODS instance.

B2SAFE needs to know the correct FQDN in the setup. I set it to the IP address of the machine docker runs on. So my PIDs are formed in that way: http://hdl.handle.net/21.T12995/ee4b7cca-7a7f-11e7-9f3d-0242ac160003?noredirect

URL: irods://145.100.59.220:1247/tempZone/home/guest/b2safe/music.pdf

I will try with the docker hostname.

chStaiger commented 7 years ago

Change serverID in /opt/eudat/b2safe/rulebase/local.re from

*serverID="irods://<ip>:1247";

to

*serverID="irods://rodserver.dockerized.io:1247";

and restart the iRODS server.

The new PID URL looks like that PID

URL: irods://rodserver.dockerized.io:1247/tempZone/home/guest/b2replication/music_reallynew.pdf

Let's see if we can resolve and fetch the data:

curl -o newFile_PID.pdf -H "Authorization: Bearer $TOKEN"  http://$SERVER:$PORT/api/pids/21.T12995/17141cc2-7a83-11e7-b1de-0242ac160003?download=true
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   363  100   363    0     0    465      0 --:--:-- --:--:-- --:--:--   465

Same problem. So the serverID in B2SAFE is not the culprit here.

muccix commented 7 years ago

The error message "Data-object can't be downloaded by current HTTP-API server 'localhost'" makes me think that the env variable PROJECT_DOMAIN in the backend container is configurated as localhost instead of the IP of the machine on which the http server is running.

You could try the following:

chStaiger commented 7 years ago

Now I get the metadata neatly downloaded into a file ... not exactly what I would expect ;)

Here is what I have done:

rapydo --hostname <machine IP> --mode debug start
rapydo shell backend
echo $PROJECT_DOMAIN
    localhost
export PROJECT_DOMAIN=<machine IP>

Then I tried to download a file:

curl -o donwload.txt -H "Authorization: Bearer $TOKEN" \
http://$SERVER:$PORT/api/pids/21.T12995/b8b14a32-90bd-11e7-9e5c-0242ac120003?download=true
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   348  100   348    0     0    288      0  0:00:01  0:00:01 --:--:--   288

And behold:

cat donwload.txt
{
  "Meta": {
    "data_type": "<class 'dict'>",
    "elements": 1,
    "errors": 1,
    "status": 300
  },
  "Response": {
    "data": {
      "URL": "irods://145.100.59.220:1247/tempZone/home/guest/b2replication/test.txt"
    },
    "errors": [
      "Data-object can't be downloaded by current HTTP-API server '145.100.59.220'"
    ]
  }
}

while the real file test.txt should look like this:

cat test.txt
1233453 test
muccix commented 7 years ago

What is the value for $SERVER:$PORT? I think that is what you should set as PROJECT_DOMAIN.

pdonorio commented 7 years ago

Hi @chStaiger, any update here? :)

pdonorio commented 7 years ago

Closing for inactivity, please feel free to reopen if the problem seems still there.