creationix / http-parser-js

A pure JS HTTP parser for node.
Other
242 stars 58 forks source link

HPE_INVALID_CONSTANT error #73

Open lawrence-peng opened 3 years ago

lawrence-peng commented 3 years ago

Hello: my application is spider ,when request target website ,response is <html>……</html>\r\n0\r\n\r\n3dd\r\n<a><script>……</script> then throw HPE_INVALID_CONSTANT error. i know the site is not standard,but i can't control this. Can you help us? thanks

Jimbly commented 3 years ago

HPE_INVALID_CONSTANT is from Node's HTTP parser, not this module, maybe using this module will help. Various newer Node versions might have issues with monkey-patching, but Node v12 should work fine following the instructions in the README.

lawrence-peng commented 3 years ago

This module is already in use,HPE_INVALID_CONSTAN error is throw in https://github.com/creationix/http-parser-js/blob/8a81c92201e49fda4668386263fdce284789ed73/http-parser.js#L232 Because <html>……</html>\r\n0\r\n\r\n3dd\r\n<a><script>……</script> response isn't match https://github.com/creationix/http-parser-js/blob/8a81c92201e49fda4668386263fdce284789ed73/http-parser.js#L224

Jimbly commented 3 years ago

Ah, my mistake! I'm assuming you're actually getting the error from line 253 (response parsing, not request)?

Looks like the response is missing all header information and rather malformed. Adding something like this into RESPONSE_LINE might cause it to be returned as the raw data, though it might get hung up somewhere else due to lack of headers...

  if (match === null) {
    if (line.match(/^<html/)) {
      this.state = 'BODY_RAW';
      return;
    }
    throw parseErrorCode('HPE_INVALID_CONSTANT');
  }

You may need to fill info.statusCode/statusMessage with some dummy values and possibly call this.userCall()(this[kOnHeadersComplete](....

Looks like it would probably skip the first line, so if you actually want that, you would probably need to undo consumeLine(), maybe this.offset = 0 is all that's needed to do that...

Also possibly just calling this.nextRequest() might skip it and get an error to trickle up if that's what you want.

lawrence-peng commented 3 years ago

Yep, that's response parsing. my mistake.

My request have two responses chunk. first response chunk (first line) is standard HTTP. last response chunk (second line) is \r\n0\r\n\r\n3dd\r\n<a><script>……</script>,

so when second line, the code

if (match === null) {
    if (line.match(/^<html/)) {
      this.state = 'BODY_RAW';
      return;
    }
    throw parseErrorCode('HPE_INVALID_CONSTANT');
  }

can't solve the issue.

By the way, to support this case, are you going to update this module?

Jimbly commented 3 years ago

Ah, that sounds like there's probably more data before what you posted then (the headers to the first chunk, perhaps specifying chunked encoding?). Then maybe it's a chunked encoding going wrong (perhaps duplicate or mismatched headers to what's actually sent?). If you find a solution and add a test case (the test cases are pretty simple just raw responses), I'm happy to merge a PR.

lawrence-peng commented 3 years ago

Got it~,thank you for taking the time to answer my question!