RedSiege / EyeWitness

EyeWitness is designed to take screenshots of websites, provide some server header info, and identify default credentials if possible.
https://www.christophertruncer.com/eyewitness-usage-guide/
GNU General Public License v3.0
5.01k stars 848 forks source link

handle timeouts from urllib (headers/source) #668

Closed Relkci closed 5 months ago

Relkci commented 5 months ago

When a website uses agressive means to bot-check, selenium may properly capture a screenshot, but use of urllib to capture headers/source may stall out (probably on something like a javascript challenge). Changes below force urllib to timeout according to the timeout argument. If you do not force urllib to timeout, python can sit indefinitely if there is no fallback env-var timeout for it to use.

closes #667

The report (objects.py) is also updated to reflect the possibility that a screenshot may have been captured even if urllib was forced into a timeout condition. The image will be included on the error section of the report, if it exists. No screenshot is shown if selenium also timed out. Because no source was captured when urllib is forced into timeout, its not possible to categorize the host into another category. There might be some scenario where we want to pull these forced-timeouts into a different report section, but for now Error seems appropriate.

The example "hotels.com" is shown below. hotels.com will load in a browser, but will indefinitely hang urllib and curl if no env-var is present to force a timeout. These code changes pass the command line argument to urllib incase an env var didn't exist.

running changed code

image

example of screenshot where urllib timed out but selenium did not

image

digininja commented 5 months ago

How can JavaScript be sent to the browser without headers?

For the browser to get the response with the JavaScript in it there must be headers with that response.

What am I missing?

On Fri, 7 Jun 2024, 19:47 Kent Ickler, @.***> wrote:

When a website uses agressive means to bot-check, selenium may properly capture a screenshot, but use of urllib to capture headers/source may stall out (probably on something like a javascript challenge). Changes below force urllib to timeout according to the timeout argument. If you do not force urllib to timeout, python can sit indefinitely if there is no fallback env-var timeout for it to use.

closes #667 https://github.com/RedSiege/EyeWitness/issues/667

The report (objects.py) is also updated to reflect the possibility that a screenshot may have been captured even if urllib was forced into a timeout condition. The image will be included on the error section of the report, if it exists. No screenshot is shown if selenium also timed out. Because no source was captured when urllib is forced into timeout, its not possible to categorize the host into another category. There might be some scenario where we want to pull these forced-timeouts into a different report section, but for now Error seems appropriate.

The example "hotels.com" is shown below. hotels.com will load in a browser, but will indefinitely hang urllib and curl if no env-var is present to force a timeout. These code changes pass the command line argument to urllib incase an env var didn't exist. running changed code

image.png (view on web) https://github.com/RedSiege/EyeWitness/assets/29710634/29935f18-124c-4774-b902-a78f1c595e60 example of screenshot where urllib timed out but selenium did not

image.png (view on web) https://github.com/RedSiege/EyeWitness/assets/29710634/60cabd34-cd29-4240-b821-6cefd0be7e11

You can view, comment on, or merge this pull request online at:

https://github.com/RedSiege/EyeWitness/pull/668 Commit Summary

File Changes

(2 files https://github.com/RedSiege/EyeWitness/pull/668/files)

Patch Links:

— Reply to this email directly, view it on GitHub https://github.com/RedSiege/EyeWitness/pull/668, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA4SWKXAIMAKT7MQIRSMVDZGH52ZAVCNFSM6AAAAABI7EQXZSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2DAOJZGA4DQMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Relkci commented 5 months ago

How can JavaScript be sent to the browser without headers? For the browser to get the response with the JavaScript in it there must be headers with that response. What am I missing? On Fri, 7 Jun 2024, 19:47 Kent Ickler, @.> wrote: When a website uses agressive means to bot-check, selenium may properly capture a screenshot, but use of urllib to capture headers/source may stall out (probably on something like a javascript challenge). Changes below force urllib to timeout according to the timeout argument. If you do not force urllib to timeout, python can sit indefinitely if there is no fallback env-var timeout for it to use. closes #667 <#667> The report (objects.py) is also updated to reflect the possibility that a screenshot may have been captured even if urllib was forced into a timeout condition. The image will be included on the error section of the report, if it exists. No screenshot is shown if selenium also timed out. Because no source was captured when urllib is forced into timeout, its not possible to categorize the host into another category. There might be some scenario where we want to pull these forced-timeouts into a different report section, but for now Error seems appropriate. The example "hotels.com" is shown below. hotels.com will load in a browser, but will indefinitely hang urllib and curl if no env-var is present to force a timeout. These code changes pass the command line argument to urllib incase an env var didn't exist. running changed code image.png (view on web) https://github.com/RedSiege/EyeWitness/assets/29710634/29935f18-124c-4774-b902-a78f1c595e60 example of screenshot where urllib timed out but selenium did not image.png (view on web) https://github.com/RedSiege/EyeWitness/assets/29710634/60cabd34-cd29-4240-b821-6cefd0be7e11 ------------------------------ You can view, comment on, or merge this pull request online at: #668 Commit Summary - 8919cc6 <8919cc6> timeout handling on urllib - 4fc1963 <4fc1963> handle when timeout occured during urllib rather than selinum. show screenshot if it was produced, else don't File Changes (2 files https://github.com/RedSiege/EyeWitness/pull/668/files) - M Python/modules/objects.py https://github.com/RedSiege/EyeWitness/pull/668/files#diff-0fb66f8a5c4778517dca35072023bd5203c820c9ac79bf4c0243213b7a871a2a (10) - M Python/modules/selenium_module.py https://github.com/RedSiege/EyeWitness/pull/668/files#diff-66c2df5764afeffe44908c1fcd791740acd61f5723a489a0c1003378ab1b2b7d (9) Patch Links: - https://github.com/RedSiege/EyeWitness/pull/668.patch - https://github.com/RedSiege/EyeWitness/pull/668.diff — Reply to this email directly, view it on GitHub <#668>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA4SWKXAIMAKT7MQIRSMVDZGH52ZAVCNFSM6AAAAABI7EQXZSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2DAOJZGA4DQMQ . You are receiving this because you are subscribed to this thread.Message ID: @.>

Touche!

I'm not certain what causes urllib to get stuck, but the example I found was related to bot-detection -- I theorized javascript challenge, but I'm not certain that is the exact cause. When I curl'd, it became clear that the remote server just doesn't respond short of accepting the connection. I'm guessing Selenium looks more human than urllib -- despite we are passing UA, etc to urllib.

image

Anyway:

Selenium (firefox) takes the screenshot

headers and source are not captured by Selenium, they are captured with urllib.

This MR specifically forces the urllib to timeout if its connection doesn't close timely (even if Selenium did get a screenshot)

digininja commented 5 months ago

I wonder if it sends the first couple of bytes of a response and then stops so the app gets something to prevent early bailing out but by sending nothing else it just ties up the connection.

On Fri, 7 Jun 2024, 20:36 Kent Ickler, @.***> wrote:

How can JavaScript be sent to the browser without headers? For the browser to get the response with the JavaScript in it there must be headers with that response. What am I missing? … <#m-374392318215019769> On Fri, 7 Jun 2024, 19:47 Kent Ickler, @.> wrote: When a website uses agressive means to bot-check, selenium may properly capture a screenshot, but use of urllib to capture headers/source may stall out (probably on something like a javascript challenge). Changes below force urllib to timeout according to the timeout argument. If you do not force urllib to timeout, python can sit indefinitely if there is no fallback env-var timeout for it to use. closes #667 https://github.com/RedSiege/EyeWitness/issues/667 <#667 https://github.com/RedSiege/EyeWitness/issues/667> The report (objects.py) is also updated to reflect the possibility that a screenshot may have been captured even if urllib was forced into a timeout condition. The image will be included on the error section of the report, if it exists. No screenshot is shown if selenium also timed out. Because no source was captured when urllib is forced into timeout, its not possible to categorize the host into another category. There might be some scenario where we want to pull these forced-timeouts into a different report section, but for now Error seems appropriate. The example "hotels.com http://hotels.com" is shown below. hotels.com http://hotels.com will load in a browser, but will indefinitely hang urllib and curl if no env-var is present to force a timeout. These code changes pass the command line argument to urllib incase an env var didn't exist. running changed code image.png (view on web) https://github.com/RedSiege/EyeWitness/assets/29710634/29935f18-124c-4774-b902-a78f1c595e60 https://github.com/RedSiege/EyeWitness/assets/29710634/29935f18-124c-4774-b902-a78f1c595e60 example of screenshot where urllib timed out but selenium did not image.png (view on web) https://github.com/RedSiege/EyeWitness/assets/29710634/60cabd34-cd29-4240-b821-6cefd0be7e11 https://github.com/RedSiege/EyeWitness/assets/29710634/60cabd34-cd29-4240-b821-6cefd0be7e11 ------------------------------ You can view, comment on, or merge this pull request online at: #668 https://github.com/RedSiege/EyeWitness/pull/668 Commit Summary - 8919cc6 https://github.com/RedSiege/EyeWitness/commit/8919cc61d944231d89384f89ce8e1fc50bd08807 <8919cc6 https://github.com/RedSiege/EyeWitness/pull/668/commits/8919cc61d944231d89384f89ce8e1fc50bd08807> timeout handling on urllib - 4fc1963 https://github.com/RedSiege/EyeWitness/commit/4fc1963e86b34d8241227beacbd92f1b1f70e524 <4fc1963 https://github.com/RedSiege/EyeWitness/pull/668/commits/4fc1963e86b34d8241227beacbd92f1b1f70e524> handle when timeout occured during urllib rather than selinum. show screenshot if it was produced, else don't File Changes (2 files https://github.com/RedSiege/EyeWitness/pull/668/files https://github.com/RedSiege/EyeWitness/pull/668/files) - M Python/modules/objects.py https://github.com/RedSiege/EyeWitness/pull/668/files#diff-0fb66f8a5c4778517dca35072023bd5203c820c9ac79bf4c0243213b7a871a2a https://github.com/RedSiege/EyeWitness/pull/668/files#diff-0fb66f8a5c4778517dca35072023bd5203c820c9ac79bf4c0243213b7a871a2a (10) - M Python/modules/selenium_module.py https://github.com/RedSiege/EyeWitness/pull/668/files#diff-66c2df5764afeffe44908c1fcd791740acd61f5723a489a0c1003378ab1b2b7d https://github.com/RedSiege/EyeWitness/pull/668/files#diff-66c2df5764afeffe44908c1fcd791740acd61f5723a489a0c1003378ab1b2b7d (9) Patch Links: - https://github.com/RedSiege/EyeWitness/pull/668.patch https://github.com/RedSiege/EyeWitness/pull/668.patch - https://github.com/RedSiege/EyeWitness/pull/668.diff https://github.com/RedSiege/EyeWitness/pull/668.diff — Reply to this email directly, view it on GitHub <#668 https://github.com/RedSiege/EyeWitness/pull/668>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA4SWKXAIMAKT7MQIRSMVDZGH52ZAVCNFSM6AAAAABI7EQXZSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2DAOJZGA4DQMQ https://github.com/notifications/unsubscribe-auth/AAA4SWKXAIMAKT7MQIRSMVDZGH52ZAVCNFSM6AAAAABI7EQXZSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2DAOJZGA4DQMQ . You are receiving this because you are subscribed to this thread.Message ID: @.>

Touche!

I'm not certain what causes urllib to get stuck, but the example I found was related to bot-detection -- I theorized javascript challenge, but I'm not certain that is the exact cause. When I curl'd, it became clear that the remote server just doesn't respond short of accepting the connection. I'm guessing Selenium looks more human than urllib -- despite we are passing UA, etc to urllib.

image.png (view on web) https://github.com/RedSiege/EyeWitness/assets/29710634/dc7a8342-fdfa-491e-8f28-d346fcd27595

Anyway:

Selenium (firefox) takes the screenshot

headers and source are not captured by Selenium, they are captured with urllib.

This MR specifically forces the urllib to timeout if its connection doesn't close timely (even if Selenium did get a screenshot)

— Reply to this email directly, view it on GitHub https://github.com/RedSiege/EyeWitness/pull/668#issuecomment-2155421121, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA4SWM6ZRHTYEEMOWWZUD3ZGIDU3AVCNFSM6AAAAABI7EQXZSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJVGQZDCMJSGE . You are receiving this because you commented.Message ID: @.***>