expath / expath-cg

Repository for the W3C EXPath Community Group.
15 stars 6 forks source link

HTTP requests: options #108

Open ChristianGruen opened 6 years ago

ChristianGruen commented 6 years ago

Here is some discusson the current options set for HTTP requests.

@adamretter: If you agree with all my proposals, I’ll be glad to create a pull request.

A. content-encoding

  1. If I understand this correctly, none has a special meaning, because it is no supported value for the Content-Encoding header field. I am wondering…
    • If we should keep it, or (my preference) drop it and use an empty sequence as default?
    • If we keep it, we should probably clarify what happens with the Content-Encoding header field. Will it be ignored, removed, or overwritten with an empty string if none is supplied?

B. transfer-encoding

  1. none: Same here…
  2. chunked is not supported by HTTP/2 anymore, so I’ll include it in the error check.
  3. I would tend to raise http:invalid-option instead of http:version (the attached error message should be explanatory enough, and it basically falls into the same category).

C. follow-redirect

  1. In the initial version of the spec, the default was true. I liked this default, maybe we can keep it.
  2. We might need to clarify what kinds of redirections an implementation is expected or required to support. The Java default implementation follows redirections automatically as long as the protocol is the same. Redirects from http to https won’t be resolved. I don’t know what the Apache HTTP Client does, or if/how redirects were handled in the Java implementation of the EXPath HTTP Client?

D. timeout

  1. Java’s default is 0, which stands for “no timeout”. Maybe it’s best to adopt this default? And an error could be raised if a negative value is supplied.

E. parse-response-entity-body

  1. I would definitely love to have a shorter name for this option, as I imagine it might be used pretty often. I see that parse-response is ambiguous; on the other hand, I assume that people who use this option will know what it does anyway. A possible alternative: parse-bodies: Only responses are “parsed” (requests are serialized), and the plural form indicates that we may have multiple bodies in a response. What do you think?
ChristianGruen commented 5 years ago

Hi @adamretter, do you want me to create a PR, which you can then comment on?

adamretter commented 5 years ago

A - https://github.com/expath/expath-cg/pull/123 B - https://github.com/expath/expath-cg/pull/124 C - https://github.com/expath/expath-cg/pull/120 D - https://github.com/expath/expath-cg/pull/122

Hmm I am not sure what the best thing to do about (E) is. I think as the default is to parse, in actual fact I think it will be rare that people set this to false... so maybe it will not be used too much and won't be a problem?

ChristianGruen commented 5 years ago

Thanks for the pull requests. – According E., I have some use cases in mind in which the option will be useful. Would you mind changing it to parse-bodies, or do you think that the wording might be ambiguous?

adamretter commented 5 years ago

@ChristianGruen regards (E). I want to actually make a suggestion for making the parsing of the request body a bit more flexible.

Instead of the status-only and parse-response-entity-body options, I would like to replace those with a single enumeration of options, lets for the moment call it parse-response. The idea is to allow greater granularity of parsing choice. parse-response would allow the values:

Having the multipart-raw option might be going too far. But otherwise I think this gives us better control and doesn't allow the error case http:invalid-option when status-only == true AND parse-response-entity-body == true.

Instead of using an xs:string type, we could actually define it as an xs:short or something and define module level variables to represent the enumeration values, e.g.

http:get('http://expath.org/xyz', map {
  'parse-response': $http:parse-status,
  'headers': map { 'User-Agent': 'EXPath/1.0' }
})
ChristianGruen commented 5 years ago

I like the suggestion!

  • raw. We don't have an equivalent option at the moment, but the idea is that the raw response from the server is returned. i.e. no parsing occurs, no status, no headers. This has applications for debugging and also for logging responses.

Maybe this could simply be the response body as binary item (possibly lazy/streamable), comparable to the response you get when using http:send-request and override-media-type:application/octet-stream?

  • status. This would be equivalent to status-only: true().

+1

  • headers. This would be the equivalent to parse-response-entity-body : false()
  • multipart-raw. This would extract the headers of the response, and locate the multipart bodies, however this would present each multipart in a raw manner, i.e. no multipart headers would be parsed.

If I get it right, the idea behind my initial parse-response option was identical to your multipart-raw option. I am not sure how a headers result would look like?

  • full. This would be the default, and basically the same as the current parse-response-entity-body : true()

+1

Instead of using an xs:string type, we could actually define it as an xs:short or something and define module level variables to represent the enumeration values, e.g.

xs:string sounds good to me. We already have various enum-like options in the XQFO 3.1 spec (e.g. for fn:serialize.

adamretter commented 5 years ago

TODO

    • [ ] Implement more complex parse-response option as mentioned above
    • [ ] I should add a comment about application/xhtml+xml in the Implicit Parsing Conversions table to say that XHTML is XML with regards parsing
    • [ ] I should add an option for manually specifying the parser for the response parts, e.g. parser="json" akin to the 1.0 spec's override-media-type option, but better named and with finer control for multipart parts.
    • [ ] Add a parse option for HTML to XML akin to the tidied facility of 1.0 spec, should be disabled by default.