IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
882 stars 493 forks source link

Add authentication and authorization to the Access (download) API. #1228

Closed landreev closed 9 years ago

landreev commented 9 years ago

There was no ticket for this; and it's fairly important.

akio-sone commented 9 years ago

Leonid, does "token authentication" mean JSON Web Tokens (JWT)? If so, which java library are you going to employ?

On 12/9/2014 2:29 PM, landreev wrote:

There was no ticket for this; and it's fairly important.

— Reply to this email directly or view it on GitHub https://github.com/IQSS/dataverse/issues/1228.

Akio Sone Odum Inst. UNC at Chapel Hill

pdurbin commented 9 years ago

In #1226 I described the problem but not the solution. Heads up to @raprasad that this upcoming change will likely affect geoconnect, which downloads files. Since TwoRavens is client-side, I'm not sure if @vjdorazio and @tercer need to know about the upcoming change.

landreev commented 9 years ago

@akio-sone : This is the same token authentication already used elsewhere in the dataverse 4.0 APIs. (src/main/java/edu/harvard/iq/dataverse/authorization/users/ApiToken.java) I don't remember off the top of my head what standard it implements/is based on; but Michael, who added this framework to the app, should be able to answer it.

landreev commented 9 years ago

Authentication and authorization should now be enforced by all download API (everything under /api/access/*) This includes all supported forms of file downloads: Straight file downloads; Format conversions, saved originals and "preprocessed data" (for tabular data files); Image thumbnails (for images); Download "bundles" (for tabular files - zip archives with the tabular file, saved original, data-level ddi + citations); Multiple file downloads as zipped archives.

How it works:

important! The API supports BOTH session-based and API token-based auth. It will first check the session and see if file access is authorized to the user associated with the session (or guest user, if none). If not, it will try to obtain authorization based on the API token, if supplied. (The token is supplied the same way as with all the other API calls where token auth is supported, as the key=... URL parameter.)

In practical terms: Suppose you have an unreleased and/or restricted dataset with some files in it. If you're not logged in, trying to access one of the files without an API key, with a URL like http://localhost:8080/api/access/datafile/FILEID will result in access denied. Once you are logged in, you'll be able to download the file from the dataset page. You will also be able to paste its download URL (still without an API key) in another window OF THE SAME browser, and still get the file. This is because the API is authenticating you based on your current browsing session, where you're still logged in as a user who's authorized to download the file. Now, if you paste the same url in a different browser, you will get "access denied" again. But you should be able to download the file if you add your API key (obtained from the ApiTokenPage) to the URL, like this: http://localhost:8080/api/access/datafile/FILEID?key=YOURAPIKEY

How it works with TwoRavens:

When the DatasetPage generates the URL for the TwoRavens app, it now sends along the API token for the session user. And the TwoRavens app has been modified to use the token when downloading tabular and preprocessed data from the dataverse (both from javascript and R code).

I have upgraded TR on both dvn-build and dataverse-demo.

How it works with multiple files downloads:

If the user isn't authorized (by session or token) to download ANY of the files requested, "access denied" is returned. If access is denied for only SOME of the files, the resulting zip file will contain the files for which access has been granted (subject to the size limit), and the download manifest will have "you do not have permission to download blablah.foo" entries for the rest.

mercecrosas commented 9 years ago

Great description @landreev

Does this handle terms of use? Or is that part of another github issue (not yet implemented)?

On Dec 11, 2014, at 5:51 PM, landreev notifications@github.com wrote:

Authentication and authorization should now be enforced by all download API (everything under /api/access/*) This includes all supported forms of file downloads: Straight file downloads; Format conversions, saved originals and "preprocessed data" (for tabular data files); Image thumbnails (for images); Download "bundles" (for tabular files - zip archives with the tabular file, saved original, data-level ddi + citations); Multiple file downloads as zipped archives.

How it works:

important! The API supports BOTH session-based and API token-based auth. It will first check the session and see if file access is authorized to the user associated with the session (or guest user, if none). If not, it will try to obtain authorization based on the API token, if supplied. (The token is supplied the same way as with all the other API calls where token auth is supported, as the key=... URL parameter.)

In practical terms: Suppose you have an unreleased and/or restricted dataset with some files in it. If you're not logged in, trying to access one of the files without an API key, with a URL like http://localhost:8080/api/access/datafile/FILEID will result in access denied. Once you are logged in, you'll be able to download the file from the dataset page. You will also be able to paste its download URL (still without an API key) in another window OF THE SAME browser, and still get the file. This is because the API is authenticating you based on your current browsing session, where you're still logged in as a user who's authorized to download the file. Now, if you paste the same url in a different browser, you will get "access denied" again. But you should be able to download the file if you add your API key (obtained from the ApiTokenPage) to the URL, like this: http://localhost:8080/api/access/datafile/FILEID?key=YOURAPIKEY

How it works with TwoRavens:

When the DatasetPage generates the URL for the TwoRavens app, it now sends along the API token for the session user. And the TwoRavens app has been modified to use the token when downloading tabular and preprocessed data from the dataverse (both from javascript and R code).

I have upgraded TR on both dvn-build and dataverse-demo.

How it works with multiple files downloads:

If the user isn't authorized (by session or token) to download ANY of the files requested, "access denied" is returned. If access is denied for only SOME of the files, the resulting zip file will contain the files for which access has been granted (subject to the size limit), and the download manifest will have "you do not have permission to download blablah.foo" entries for the rest.

Reply to this email directly or view it on GitHub https://github.com/IQSS/dataverse/issues/1228#issuecomment-66703983.

mheppler commented 9 years ago

(Sorry for the erroneous ref in my last commit.)

kcondon commented 9 years ago

This was tested for beta 10, closing.