danieldotnl / ha-multiscrape

Home Assistant custom component for scraping (html, xml or json) multiple values (from a single HTTP request) with a separate sensor/attribute for each value. Support for (login) form-submit functionality.
MIT License
247 stars 14 forks source link

Support for handling cookies between requests #368

Open danieldotnl opened 1 month ago

stormshaker commented 1 month ago

I have a website that sets a session ID in a cookie in the form response header. I could test this branch if you created a Pre-Release of it? Is there anything special needed in a config, or will it submit all cookies back with the page request, like a browser?

danieldotnl commented 1 month ago

I will release this soon. No config required, it will indeed send all cookies back.

stormshaker commented 5 days ago

I tried testing this PR. The cookies look strange in the log_response files—is that just how the library formats the log, or is it the way the cookie is extracted?

From the form response, I expect a single cookie: __Host-secureSessionID. It's the auth token after a successful login. If I look at the form_submit_response_cookies.txt, it looks like this (token redacted).

<Cookies[<Cookie __Host-secureSessionID=ad44[REDACTED]aba for app.boxdivvy.com.au />, <Cookie remember=n for app.boxdivvy.com.au />]>

Does that look right? Is it supposed to have the 'for app.boxdivvy.com.au'? The page_request_cookies.txt file is identical. My page_response_body.txt I'm getting back is just the login page again.

danieldotnl commented 5 days ago

Thanks for testing this PR! It's not clear to me though, if everything is working fine and you are just worried about the format of the cookies, or if there is an actual problem? I'm just writing the cookies object to a file, and then this is what you end up with. Not sure why you expect only one cookie. If you check the browser, you'll see two cookies, just like logged in the multiscrape logging file: image

stormshaker commented 5 days ago

No, not worried there are two. Just asking if the format is as expected, the 'for web.site.name' looked strange, but that could just be how they're displayed. Not working yet, I'll keep trying.