DataONEorg / rdataone

R package for reading and writing data at DataONE data repositories
http://doi.org/10.5063/F1M61H5X
36 stars 19 forks source link

Error during `createObject()` - "Request Entity Too Large" #210

Closed gothub closed 2 years ago

gothub commented 6 years ago

Reported by @maier-m - this occurs intermittently during createUpdate() for large or small objects, as small as 303KB:

Error in .local(x, ...) : …  Error creating urn:uuid:b1728cb2-915a-43b6-8d26-be4227bcc669: Client error: (413) Request Entity Too Large

Downgrading curl to 2.7.1 can fix the problem

Determine what the underlying problem with the curl package/library is.

amoeba commented 6 years ago

There's a long history with this regarding the HTTP Expect header. I've mentioned this a few times and @csjx and I have spent a lot of time looking into this. I'll go find those issues and link them here.

amoeba commented 6 years ago

https://redmine.dataone.org/issues/7822 https://redmine.dataone.org/issues/2693

I don't consider this an rdataone issue.

jeanetteclark commented 6 years ago

@amoeba we saw this error again. I'm wondering if we can file an issue in the curl library about this? I don't know enough about it to understand but it sounds like it may be a problem on their end rather than ours?

In any case, I'm very sick of this error

mbjones commented 6 years ago

We basically need to get rid of client side certificates. Fixing this means changing the way authentication works in DataONE overall. @csjx put together a proposal that is being evaluated now. But its not going to happen fast as it impacts all 50+ member nodes.

That said, if we change the R client to only use JWT tokens and drop support for certificates in R, then we might modify the Arctic Data Center and KNB to no longer use client side certs except for the CN to MN and MN to MN communication. This is something the devteam should discuss, and if it seems like it could work, then we could propose a fix. I think it would involve configuring apache to only verify client certs when the client is in a known list of CNs and MNs. @gothub do you see any reason the R client would need to continue to support certificates, or could we remove that fully?

gothub commented 6 years ago

How to we determine that no R user is creating and using certificates? If no users are using certs, then cert handling could be removed from the R client.

mbjones commented 6 years ago

Even if users are using certs, they can change to using tokens without any code changes to their programs. So, I think it would be ok either way.

amoeba commented 6 years ago

Thanks for bringing this back up @jeanetteclark. I'd like to see this fixed too. I've talked with the creator of the crul package about making a change for our use case and he preferred to follow the upstream library (libcurl) instead (which makes total sense to me).

@mbjones, @gothub I just did a quick test of Matt's idea and it appears like we can make a minor tweak to our Apache configs to get this to work.

I logged into test.arcticdata.io to see what our Apache config looks like,

SSLVerifyClient none
<If " ! ( %{HTTP_USER_AGENT} =~ /(windows|chrome|mozilla|safari|webkit)/i )">
  SSLVerifyClient optional
</If>
SSLVerifyDepth  10

which matched my memory of it. I decided to use httr to test the idea that, if we could set SSLVerifyClient to none for the R client that we'd see this error go away. I decided to just spoof the user agent to trigger Apache to not SSLClientVerify by setting my UA to safari (see second test).

httr::POST("https://test.arcticdata.io/metacat/d1/mn/v2/object", 
           body = list(pid = pid,
                       sysmeta = upload_file(sm_path),
                       object = upload_file(file_path)),
           add_headers(Authorization = paste("Bearer", getOption("dataone_test_token"))),
           user_agent("safari"))

A bit more testing / discussing might be appropriate to get a path scoped out but I think this shows promise. The R dataone package already sets a custom UA so we could just add that string to the SSLClientVerify whitelist we already have:

https://github.com/DataONEorg/rdataone/blob/9947be00e9eb310d1d3d65e1dcc6b0e174aed4b2/R/auth_request.R#L264-L267

Does this sound like a doable short-term fix until we change how our DataONE infrastructure handles client certs?

mbjones commented 6 years ago

That sounds like a great suggestion to me, as it limits impact to the R client and NCEAS-managed data repositories, and gives us time to work out the larger issues with the DataONE authentication process. So, I'd say let's test exclusion of the R client user agent, and then deploy on all of our managed MNs (KNB, ADC, ...).

amoeba commented 6 years ago

Had a great chat on our dev call yesterday. Notes here. TLDR: We can fix this fast.

I sent an email to dev@nceas:

Hey all, most of us were on the dev call yesterday but Chris made a good point that emailing the dev list with a summary plus action items would be good for visibility. Please take a look through to see if we're missing anything.

What's the R & HTTP 413 problem?

See https://github.com/DataONEorg/rdataone/issues/210 for more detail. A short summary is that createObject/updateObject calls began failing a while ago for users of the R dataone package because of a change in the underlying HTTP library it uses (deprecation of the Expect HTTP header) which was incompatible with our support of client certificates in the Apache configs we put in front of Metacat (to support client certs).

What are we doing now?

We're version-pinning our R users to an old version of the curl package. This is super painful and likely insecure.

What are we going to do right now to fix this for our users?

  1. Add the string 'httr' to our SSLClientVerify whitelist in our Apache configs on all Metacat installations we manage. The dataone R package sends a UA that includes the strings 'dataone', 'httr', and 'R' (plus version strings). We decided on adding 'httr' to the whitelist so (1) both users of rdataone and httr (a common http lib for R) get the fix and (2) we reduce the potential for breakage in other libraries that use 'dataone' in their UA string (Python, Java)
  2. Update the Metacat documentation (probably the DataONE section) with this information
  3. Update our boilerplate Apache config we ship with Metacat
  4. Coordinate with DataONE on this work

I'll be filing Issues in relevant repos shortly but please look for those. Thanks!

rogerdahl commented 6 years ago

Have you considered basing the SSLClientVerify setting on the presence of the Authorization header? That is, set SSLClientVerify none if the client passes a JWT.

amoeba commented 6 years ago

That seems like a really good idea @rogerdahl. It's a bit more direct and would help the R client and any other non-browser clients that send JWTs w/o the Expect header. I think we'd still need the existing Apache config portion so browsers, like Safari, don't unnecessarily ask the user for a client cert.

Maybe something like this?

<If "%{HTTP:Authorization} =~ /\ABearer .+">
    SSLClientVerify none
</If>
jeanetteclark commented 2 years ago

I haven't seen this error in a long time. Did changes to other underlying libraries (ours or others) render this not a problem anymore? @amoeba if you have any idea, let me know, otherwise I'm going to close and will reopen if we see it again

amoeba commented 2 years ago

I haven't seen this in a while either and I think underlying libraries must have changed. Our web server configs on sites like the KNB or ADC don't even have the workarounds we put in to deal with this any longer. I'm in favor of closing the issue.