Closed Relkci closed 5 months ago
How can JavaScript be sent to the browser without headers?
For the browser to get the response with the JavaScript in it there must be headers with that response.
What am I missing?
On Fri, 7 Jun 2024, 19:47 Kent Ickler, @.***> wrote:
When a website uses agressive means to bot-check, selenium may properly capture a screenshot, but use of urllib to capture headers/source may stall out (probably on something like a javascript challenge). Changes below force urllib to timeout according to the timeout argument. If you do not force urllib to timeout, python can sit indefinitely if there is no fallback env-var timeout for it to use.
closes #667 https://github.com/RedSiege/EyeWitness/issues/667
The report (objects.py) is also updated to reflect the possibility that a screenshot may have been captured even if urllib was forced into a timeout condition. The image will be included on the error section of the report, if it exists. No screenshot is shown if selenium also timed out. Because no source was captured when urllib is forced into timeout, its not possible to categorize the host into another category. There might be some scenario where we want to pull these forced-timeouts into a different report section, but for now Error seems appropriate.
The example "hotels.com" is shown below. hotels.com will load in a browser, but will indefinitely hang urllib and curl if no env-var is present to force a timeout. These code changes pass the command line argument to urllib incase an env var didn't exist. running changed code
image.png (view on web) https://github.com/RedSiege/EyeWitness/assets/29710634/29935f18-124c-4774-b902-a78f1c595e60 example of screenshot where urllib timed out but selenium did not
image.png (view on web) https://github.com/RedSiege/EyeWitness/assets/29710634/60cabd34-cd29-4240-b821-6cefd0be7e11
You can view, comment on, or merge this pull request online at:
https://github.com/RedSiege/EyeWitness/pull/668 Commit Summary
- 8919cc6 https://github.com/RedSiege/EyeWitness/pull/668/commits/8919cc61d944231d89384f89ce8e1fc50bd08807 timeout handling on urllib
- 4fc1963 https://github.com/RedSiege/EyeWitness/pull/668/commits/4fc1963e86b34d8241227beacbd92f1b1f70e524 handle when timeout occured during urllib rather than selinum. show screenshot if it was produced, else don't
File Changes
(2 files https://github.com/RedSiege/EyeWitness/pull/668/files)
- M Python/modules/objects.py https://github.com/RedSiege/EyeWitness/pull/668/files#diff-0fb66f8a5c4778517dca35072023bd5203c820c9ac79bf4c0243213b7a871a2a (10)
- M Python/modules/selenium_module.py https://github.com/RedSiege/EyeWitness/pull/668/files#diff-66c2df5764afeffe44908c1fcd791740acd61f5723a489a0c1003378ab1b2b7d (9)
Patch Links:
- https://github.com/RedSiege/EyeWitness/pull/668.patch
- https://github.com/RedSiege/EyeWitness/pull/668.diff
— Reply to this email directly, view it on GitHub https://github.com/RedSiege/EyeWitness/pull/668, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA4SWKXAIMAKT7MQIRSMVDZGH52ZAVCNFSM6AAAAABI7EQXZSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2DAOJZGA4DQMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
How can JavaScript be sent to the browser without headers? For the browser to get the response with the JavaScript in it there must be headers with that response. What am I missing? … On Fri, 7 Jun 2024, 19:47 Kent Ickler, @.> wrote: When a website uses agressive means to bot-check, selenium may properly capture a screenshot, but use of urllib to capture headers/source may stall out (probably on something like a javascript challenge). Changes below force urllib to timeout according to the timeout argument. If you do not force urllib to timeout, python can sit indefinitely if there is no fallback env-var timeout for it to use. closes #667 <#667> The report (objects.py) is also updated to reflect the possibility that a screenshot may have been captured even if urllib was forced into a timeout condition. The image will be included on the error section of the report, if it exists. No screenshot is shown if selenium also timed out. Because no source was captured when urllib is forced into timeout, its not possible to categorize the host into another category. There might be some scenario where we want to pull these forced-timeouts into a different report section, but for now Error seems appropriate. The example "hotels.com" is shown below. hotels.com will load in a browser, but will indefinitely hang urllib and curl if no env-var is present to force a timeout. These code changes pass the command line argument to urllib incase an env var didn't exist. running changed code image.png (view on web) https://github.com/RedSiege/EyeWitness/assets/29710634/29935f18-124c-4774-b902-a78f1c595e60 example of screenshot where urllib timed out but selenium did not image.png (view on web) https://github.com/RedSiege/EyeWitness/assets/29710634/60cabd34-cd29-4240-b821-6cefd0be7e11 ------------------------------ You can view, comment on, or merge this pull request online at: #668 Commit Summary - 8919cc6 <8919cc6> timeout handling on urllib - 4fc1963 <4fc1963> handle when timeout occured during urllib rather than selinum. show screenshot if it was produced, else don't File Changes (2 files https://github.com/RedSiege/EyeWitness/pull/668/files) - M Python/modules/objects.py https://github.com/RedSiege/EyeWitness/pull/668/files#diff-0fb66f8a5c4778517dca35072023bd5203c820c9ac79bf4c0243213b7a871a2a (10) - M Python/modules/selenium_module.py https://github.com/RedSiege/EyeWitness/pull/668/files#diff-66c2df5764afeffe44908c1fcd791740acd61f5723a489a0c1003378ab1b2b7d (9) Patch Links: - https://github.com/RedSiege/EyeWitness/pull/668.patch - https://github.com/RedSiege/EyeWitness/pull/668.diff — Reply to this email directly, view it on GitHub <#668>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA4SWKXAIMAKT7MQIRSMVDZGH52ZAVCNFSM6AAAAABI7EQXZSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2DAOJZGA4DQMQ . You are receiving this because you are subscribed to this thread.Message ID: @.>
Touche!
I'm not certain what causes urllib to get stuck, but the example I found was related to bot-detection -- I theorized javascript challenge, but I'm not certain that is the exact cause. When I curl'd, it became clear that the remote server just doesn't respond short of accepting the connection. I'm guessing Selenium looks more human than urllib -- despite we are passing UA, etc to urllib.
Anyway:
Selenium (firefox) takes the screenshot
headers and source are not captured by Selenium, they are captured with urllib.
This MR specifically forces the urllib to timeout if its connection doesn't close timely (even if Selenium did get a screenshot)
I wonder if it sends the first couple of bytes of a response and then stops so the app gets something to prevent early bailing out but by sending nothing else it just ties up the connection.
On Fri, 7 Jun 2024, 20:36 Kent Ickler, @.***> wrote:
How can JavaScript be sent to the browser without headers? For the browser to get the response with the JavaScript in it there must be headers with that response. What am I missing? … <#m-374392318215019769> On Fri, 7 Jun 2024, 19:47 Kent Ickler, @.> wrote: When a website uses agressive means to bot-check, selenium may properly capture a screenshot, but use of urllib to capture headers/source may stall out (probably on something like a javascript challenge). Changes below force urllib to timeout according to the timeout argument. If you do not force urllib to timeout, python can sit indefinitely if there is no fallback env-var timeout for it to use. closes #667 https://github.com/RedSiege/EyeWitness/issues/667 <#667 https://github.com/RedSiege/EyeWitness/issues/667> The report (objects.py) is also updated to reflect the possibility that a screenshot may have been captured even if urllib was forced into a timeout condition. The image will be included on the error section of the report, if it exists. No screenshot is shown if selenium also timed out. Because no source was captured when urllib is forced into timeout, its not possible to categorize the host into another category. There might be some scenario where we want to pull these forced-timeouts into a different report section, but for now Error seems appropriate. The example "hotels.com http://hotels.com" is shown below. hotels.com http://hotels.com will load in a browser, but will indefinitely hang urllib and curl if no env-var is present to force a timeout. These code changes pass the command line argument to urllib incase an env var didn't exist. running changed code image.png (view on web) https://github.com/RedSiege/EyeWitness/assets/29710634/29935f18-124c-4774-b902-a78f1c595e60 https://github.com/RedSiege/EyeWitness/assets/29710634/29935f18-124c-4774-b902-a78f1c595e60 example of screenshot where urllib timed out but selenium did not image.png (view on web) https://github.com/RedSiege/EyeWitness/assets/29710634/60cabd34-cd29-4240-b821-6cefd0be7e11 https://github.com/RedSiege/EyeWitness/assets/29710634/60cabd34-cd29-4240-b821-6cefd0be7e11 ------------------------------ You can view, comment on, or merge this pull request online at: #668 https://github.com/RedSiege/EyeWitness/pull/668 Commit Summary - 8919cc6 https://github.com/RedSiege/EyeWitness/commit/8919cc61d944231d89384f89ce8e1fc50bd08807 <8919cc6 https://github.com/RedSiege/EyeWitness/pull/668/commits/8919cc61d944231d89384f89ce8e1fc50bd08807> timeout handling on urllib - 4fc1963 https://github.com/RedSiege/EyeWitness/commit/4fc1963e86b34d8241227beacbd92f1b1f70e524 <4fc1963 https://github.com/RedSiege/EyeWitness/pull/668/commits/4fc1963e86b34d8241227beacbd92f1b1f70e524> handle when timeout occured during urllib rather than selinum. show screenshot if it was produced, else don't File Changes (2 files https://github.com/RedSiege/EyeWitness/pull/668/files https://github.com/RedSiege/EyeWitness/pull/668/files) - M Python/modules/objects.py https://github.com/RedSiege/EyeWitness/pull/668/files#diff-0fb66f8a5c4778517dca35072023bd5203c820c9ac79bf4c0243213b7a871a2a https://github.com/RedSiege/EyeWitness/pull/668/files#diff-0fb66f8a5c4778517dca35072023bd5203c820c9ac79bf4c0243213b7a871a2a (10) - M Python/modules/selenium_module.py https://github.com/RedSiege/EyeWitness/pull/668/files#diff-66c2df5764afeffe44908c1fcd791740acd61f5723a489a0c1003378ab1b2b7d https://github.com/RedSiege/EyeWitness/pull/668/files#diff-66c2df5764afeffe44908c1fcd791740acd61f5723a489a0c1003378ab1b2b7d (9) Patch Links: - https://github.com/RedSiege/EyeWitness/pull/668.patch https://github.com/RedSiege/EyeWitness/pull/668.patch - https://github.com/RedSiege/EyeWitness/pull/668.diff https://github.com/RedSiege/EyeWitness/pull/668.diff — Reply to this email directly, view it on GitHub <#668 https://github.com/RedSiege/EyeWitness/pull/668>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA4SWKXAIMAKT7MQIRSMVDZGH52ZAVCNFSM6AAAAABI7EQXZSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2DAOJZGA4DQMQ https://github.com/notifications/unsubscribe-auth/AAA4SWKXAIMAKT7MQIRSMVDZGH52ZAVCNFSM6AAAAABI7EQXZSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2DAOJZGA4DQMQ . You are receiving this because you are subscribed to this thread.Message ID: @.>
Touche!
I'm not certain what causes urllib to get stuck, but the example I found was related to bot-detection -- I theorized javascript challenge, but I'm not certain that is the exact cause. When I curl'd, it became clear that the remote server just doesn't respond short of accepting the connection. I'm guessing Selenium looks more human than urllib -- despite we are passing UA, etc to urllib.
image.png (view on web) https://github.com/RedSiege/EyeWitness/assets/29710634/dc7a8342-fdfa-491e-8f28-d346fcd27595
Anyway:
Selenium (firefox) takes the screenshot
headers and source are not captured by Selenium, they are captured with urllib.
This MR specifically forces the urllib to timeout if its connection doesn't close timely (even if Selenium did get a screenshot)
— Reply to this email directly, view it on GitHub https://github.com/RedSiege/EyeWitness/pull/668#issuecomment-2155421121, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA4SWM6ZRHTYEEMOWWZUD3ZGIDU3AVCNFSM6AAAAABI7EQXZSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJVGQZDCMJSGE . You are receiving this because you commented.Message ID: @.***>
When a website uses agressive means to bot-check, selenium may properly capture a screenshot, but use of urllib to capture headers/source may stall out (probably on something like a javascript challenge). Changes below force urllib to timeout according to the timeout argument. If you do not force urllib to timeout, python can sit indefinitely if there is no fallback env-var timeout for it to use.
closes #667
The report (objects.py) is also updated to reflect the possibility that a screenshot may have been captured even if urllib was forced into a timeout condition. The image will be included on the error section of the report, if it exists. No screenshot is shown if selenium also timed out. Because no source was captured when urllib is forced into timeout, its not possible to categorize the host into another category. There might be some scenario where we want to pull these forced-timeouts into a different report section, but for now Error seems appropriate.
The example "hotels.com" is shown below. hotels.com will load in a browser, but will indefinitely hang urllib and curl if no env-var is present to force a timeout. These code changes pass the command line argument to urllib incase an env var didn't exist.
running changed code
example of screenshot where urllib timed out but selenium did not