NCEAS / metacatui

MetacatUI: A client-side web interface for DataONE data repositories
https://nceas.github.io/metacatui
Apache License 2.0
42 stars 27 forks source link

Unauthorized Error when Loading Large Datasets in Metadata Editor #2547

Open vchendrix opened 1 week ago

vchendrix commented 1 week ago

When loading a dataset with over 100 data files in the Metadata Editor, users may encounter an error message "You are not authorized to edit this data set." However, this error message is not always displayed, and the page may appear to load normally. Upon inspecting the JavaScript console, a 401 Unauthorized error is visible.

Steps to Reproduce:

  1. Load a dataset with over 100 data files in the Metadata Editor.
  2. Observe the error message "You are not authorized to edit this data set." (if displayed).
  3. Check the JavaScript console for a 401 Unauthorized error.

Expected Behavior:

The Metadata Editor should load the dataset without errors, regardless of the number of data files.

Actual Behavior:

The Metadata Editor displays an error message or fails to load the dataset, with a 401 Unauthorized error visible in the JavaScript console.

Additional Context:

Screen shot of error Screenshot 2024-10-09 at 4 39 24 PM

Screencast of Issue https://drive.google.com/file/d/1gMWnKKXP0esWlOtjtaNQHS67xHUJ3YQ4/view?usp=sharing

Example Error Information:

mburrus commented 1 week ago

I'm seeing this error on another private dataset with over 100 files: https://data.ess-dive.lbl.gov/view/ess-dive-161d0c0f88a0849-20240815T185407332338

Error messages I'm getting: It's a mix of 200, 401, 501, and (failed)net::ERR_HTTP2_SERVER_REFUSED_STREAM errors across the data file PIDs.

Screenshot: Screenshot 2024-10-11 at 3 12 16 PM

More Context

mbjones commented 5 days ago

@vchendrix @mburrus Can you describe the access policies on all of the objects in this package?

@robyngit @rushirajnenuji the http2 error may be a new side-effect of enabling HTTP/2 as a more efficient protocol on servers lately. Maybe @vchendrix can let us know if they support clinet requests for only HTTP/1.1 or also HTTP/2. Some of our test servers at NCEAS use HTTP/2, but AFAIK none of our production servers have it enabled yet.

vchendrix commented 5 days ago

@vchendrix @mburrus Can you describe the access policies on all of the objects in this package?

  • Does your logged in ORCID have write access to all of the objects in this package? Yes.

  • Are all of the objects set with identical access policies? Yes.
    files_in_dataset.csv

    • if yes, then does the logged in ORCID have write access to all objects (metadata, resource map, data)? Yes.

      curl -H "Authorization: Bearer $ESS_DIVE_AUTH_TOKEN" "https://data.ess-dive.lbl.gov/catalog/d1/mn/v2/query/solr?q=id:ess-dive-3a48ab5f69ecf8d-20240108T174327967&fl=id,writePermission&wt=json" 
      {
      "responseHeader":{
      "status":0,
      "QTime":0,
      "params":{
      "q":"id:ess-dive-3a48ab5f69ecf8d-20240108T174327967",
      "fl":"id,writePermission",
      "wt":"javabin",
      "version":"2"}},
      "response":{"numFound":1,"start":0,"numFoundExact":true,"docs":[
      {
      "id":"ess-dive-3a48ab5f69ecf8d-20240108T174327967",
      "writePermission":["CN=ess-dive-admins,DC=dataone,DC=org",
      "CN=watershed-function-sfa-admin,DC=dataone,DC=org",
      "CN=urn:node:ESS_DIVE,DC=dataone,DC=org"]}]
      }}
    • If no, then:

    • are the metadata file and resource map file writable by the logged in ORCID?

    • are all of the data files the same policy, and what permissions does your ORCID have?

@robyngit @rushirajnenuji the http2 error may be a new side-effect of enabling HTTP/2 as a more efficient protocol on servers lately. Maybe @vchendrix can let us know if they support clinet requests for only HTTP/1.1 or also HTTP/2. Some of our test servers at NCEAS use HTTP/2, but AFAIK none of our production servers have it enabled yet.

Our services support HTTP/2 and HTTP/1.1

% curl -s -I -X HEAD  https://data.ess-dive.lbl.gov 
HTTP/2 200 
date: Tue, 15 Oct 2024 18:21:02 GMT
content-type: text/html
content-length: 10352
set-cookie: INGRESSCOOKIE=19cdfff91257311df6e1f2f92cc10ee1|47d24e7c0dbc2412b1cf3a747b30e59a; Path=/; Secure; HttpOnly
x-frame-options: SAMEORIGIN
last-modified: Fri, 23 Aug 2024 21:21:03 GMT
etag: "2870-620605a3a8dc0"
accept-ranges: bytes
access-control-allow-origin: 
access-control-allow-headers: Authorization, Content-Type, Origin, Cache-Control
access-control-allow-methods: GET, POST, PUT, OPTIONS
access-control-allow-credentials: true
strict-transport-security: max-age=15724800; includeSubDomains

(base) val@vchendrix ~ % curl -s -I -X HEAD --http1.1 https://data.ess-dive.lbl.gov
HTTP/1.1 200 OK
Date: Tue, 15 Oct 2024 18:21:53 GMT
Content-Type: text/html
Content-Length: 10352
Connection: keep-alive
Set-Cookie: INGRESSCOOKIE=29a1440a13a68111b1a3b64412631550|47d24e7c0dbc2412b1cf3a747b30e59a; Path=/; Secure; HttpOnly
X-Frame-Options: SAMEORIGIN
Last-Modified: Fri, 23 Aug 2024 21:21:03 GMT
ETag: "2870-620605a3a8dc0"
Accept-Ranges: bytes
Access-Control-Allow-Origin: 
Access-Control-Allow-Headers: Authorization, Content-Type, Origin, Cache-Control
Access-Control-Allow-Methods: GET, POST, PUT, OPTIONS
Access-Control-Allow-Credentials: true
Strict-Transport-Security: max-age=15724800; includeSubDomains
mburrus commented 2 days ago

Hi @mbjones I have a follow up on the dataset that Val provided details for.

Considering that the unauthorized error is intermittently appearing and sometimes the user can edit the dataset, I told the user that they should go ahead and reload the edit session until it works. They were able to load the edit session eventually, but then they encountered an unexpected error message when they attempted to submit changes and their dataset was corrupted.

Here are the steps they took:

  1. Reload the submit URL until it works
  2. Change some metadata fields
  3. Upload 6 new files. 4/6 files uploaded successfully with a check mark. 2/6 files failed to upload and had a red exclamation mark.
  4. Click "Submit Dataset"
  5. Submission failed. See error message on top of webpage that says: The requested identifier <PID> is already used by another data object and therefore can not be used for this object...
  6. Go back to view landing page and see that files are no longer listed in the file table. I confirmed that the resource map is now missing.

Here's the quote from the user:

While it seemed that refreshing a few times worked to load the dataset, upon submission of my edits, I received an error for two of the new files I was attempting to upload (see screenshots). It now appears that some of the edits were saved (e.g., abstract, title), however, I am no longer able to see the files at the top of my dataset under "Files in this dataset" ( I see these listed at the bottom of my dataset).

Screenshots: Screenshot 2024-10-16 at 5 07 48 PM Screenshot 2024-10-16 at 5 07 53 PM

vchendrix commented 1 day ago

@mbjones Looking a little more into this issue when loading the data table in the editor. Looked at the /meta calls that were returning 401 errors and they don't seem to be authenticating the token correctly. The token is there and it is valid but the following Metacat error is logged.

Error for https://data.ess-dive.lbl.gov/catalog/d1/mn/v2/meta/ess-dive-5c5a631453d321e-20231130T213147717572

2024-10-18T21:41:24.459477371Z metacat 20241018-21:41:24: [ERROR]: D1ResourceHandler: Serializing exception with code 401: READ not allowed on ess-dive-0c7f0edc620a810-20231130T213140216564 for subject[s]: public; authenticatedUser; http://orcid.org/0000-0001-9061-8952;  [edu.ucsb.nceas.metacat.restservice.D1ResourceHandler:serializeException:591]
org.dataone.service.exceptions.NotAuthorized: READ not allowed on ess-dive-0c7f0edc620a810-20231130T213140216564 for subject[s]: public; authenticatedUser; http://orcid.org/0000-0001-9061-8952; 
2024-10-18T21:41:24.459482481Z  at edu.ucsb.nceas.metacat.dataone.D1AuthHelper.prepareAndThrowNotAuthorized(D1AuthHelper.java:461) ~[metacat.jar:?]