Qard / http-request-to-url

Get url of an http client request
0 stars 1 forks source link

Proxy support #1

Open watson opened 4 years ago

watson commented 4 years ago

I found a "bug" where, if the HTTP request is being proxied, the URL returned by this module looks something like this:

http://my-proxy.com:8080http://example.com/foo

The problem is that the first line the HTTP header might contain a full URL, not only the path part of the URL, e.g:

GET http://example.com/foo HTTP/1.1

I've made a test that can be dropped into the test.js file that shows this problem:

tap.test('proxy', async t => {
  const { port } = await makeServer()

  const opts = {
    host: '127.0.0.1',
    port,
    method: 'CONNECT',
    path: 'http://example.com/foo'
  }
  const client = http.request(opts, res => res.resume())
  client.end()

  const location = await httpRequestToUrl(client)
  t.equal(location, 'http://example.com/foo')
  t.end()
})

However, I wasn't really sure what the correct solution to this problem was.

There is two ways to deal with this:

  1. Either the module returns the proxy host/port
  2. Or the module returns the final destination (in this case shown in the path)

What do you think?

ebuildy commented 3 years ago

do you know a way to guess if a socket use a proxy or not?

Qard commented 3 years ago

Not sure how I missed this issue until now. 🤔

Anyway, what do you think of returning the final destination by default and adding an option to prefer the proxy information?

astorm commented 3 years ago

Anyway, what do you think of returning the final destination by default and adding an option to prefer the proxy information?

Datapoint of "one interested downstream consumer of this module" -- "returning the final destination by default and adding an option to prefer the proxy information" sounds like a reasonable solution to me.

Qard commented 3 years ago

@astorm PRs are welcome, and if you like, I could give you push/publish access for this package. Otherwise, I don't know when I would get around to adding this functionality. Do you have the bandwidth to do something about it if I give you access?

astorm commented 3 years ago

Thank you @Qard -- let's hold off on giving me access to this repo. That (for me) comes with a lot of extra implicit responsibility I'm not in a position to take on.

I should be able to handle a PR though -- I've got some time on Fridays carved out for non-roadmap work.

To keep this conversation going async before then -- would something that looks like this pseudo code be acceptable to you?

    const {getFinalUrl} = require('./lib.js')
    module.exports = async function httpRequestToUrl (request, opts) {
      /* ... */

      return getFinalUrl(proto, localtion, request.path)
    }

    function getFinalUrl(proto, location, request.path, opts) {
        switch(opts.strategy){
            case 'auto': // the default 
                // logic to examine `location` and `require.path` and
                // determine the right thing to do based on whether one
                // includes a leading protocol or not (preferring
                // `request.path` if it's a full URL
                return url
            case 'preferDestination':
                // logic for 
                return request.path
            case 'preferFirsthop':
                return `${proto}//${location}
            default:
                return `${proto}//${location}${request.path}`
        }
    }

Open to any feedback, but I'm specifically looking for a yay/nay on

  1. The names.

  2. Any objections to breaking this functionality out into a library so it can be unit tested?

  3. Any strong feelings on option values that are strings vs. integer constants vs. symbols vs. some-other-thing that was in vouge sometime in the past 10 years?

Qard commented 3 years ago

Not sure we need so many options. As far as I can tell, the auto branch does everything you need. I'd just do a boolean flag called followProxies or something like that. Not sure what the utility is of the other two options. Do you actually need that? Seems like adding functionality no one asked for. 🤷

astorm commented 3 years ago

Do you actually need that? Seems like adding functionality no one asked for.

nod -- that makes sense and why I asked. Over the years I've found some folks have a "nail down other things a user might ask for while we're in here" philosophy while others have a "solve the current issue with as little code as possible" philosophy.

It sounds like

I'd approach this with a I'd just do a boolean flag called followProxies or something like that

is the right approach for your repo. :)

If no one else has grabbed this by Friday, that's the approach I'll take in my PR.

Qard commented 3 years ago

Covering more cases means more code to maintain though. I prefer small modules that ideally will never (or at least nearly never) need to change.

watson commented 3 years ago

Just my 5 cents: I like the idea of having auto being the default. In most cases this is what the user want - and I assume this is at least what the user would normally expect as the default behaviour. This could be implemented as a bug-fix without worry of braking anything, as the current behaviour should be considered a bug. Having a boolean option to turn it on/off sounds like a good idea as well.