dgtlmoon / changedetection.io

The best and simplest free open source web page change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monitor which websites had a text change for free. Free Open source web page change detection, Website defacement monitoring, Price change notification
https://changedetection.io
Apache License 2.0
16.94k stars 947 forks source link

[feature] Store response cookies and use them on following checks (for example, log in just once) #1836

Open ignaciolg opened 11 months ago

ignaciolg commented 11 months ago

Version and OS docker

Is your feature request related to a problem? Please describe. I've set up a check for Amazon that requires to be logged in. I have used a working cookie and the check works, but eventually, the cookie allowed time to expire, not allowing changedetection to reach the wanted data. It looks like changedetection does not update the cookie, or that the cookie field has more priority than the stored cookie (if any)

Describe the solution you'd like Have a checkbox option under the Request tab to enable the storage and future usage of the cookies, so login mechanisms that depend on cookies with expiration time will keep working

This option should override the default set cookie header, which will be used on the original check to access the content

Describe the use-case and give concrete real-world examples I.e. Amazon product scraping from lists that are only being shown to registered or subscribed users

Additional context Current configuration

image
dgtlmoon commented 11 months ago

So i guess it's also more like - you want the cookie: header (in changedetection) to always be updated whenever the site you're visiting is updating it

image

This also applies to browsersteps I think, so you can login manually via browsersteps, then remove all the login steps

image

those two should be the same setting

The fetcher should always report back the state 'cookie' with the scraped visual-selector data, then re-use that on the next request (if the checkbox is on)

maybe browsersteps can have two modes Login Mode and Step Mode

ignaciolg commented 11 months ago

This behavior should keep the changedetection logged in during these scenarios!

I was also thinking about the steps involved in using a browser and I had the same idea. When setting up a new check, it would be helpful if we could simply go through the login process and end up on the desired page. Then, we could accept the setup and periodically check the last step or the desired URL without having to take any further action on each iteration.

Thx you @dgtlmoon!

dgtlmoon commented 11 months ago

I just found something interesting, that sometimes sites will let you load once, then block you again (fine wine and good spirits for example), however, when I'm setting a cookie from an existing good session, it seems to operate quite fine

They are also setting "Local Storage" and "Session storage" values in the browser

So actually this is more about getting puppeteer/playwright to store and restore the complete browser storage state (and allow it to be updated) on a per-URL basis.. hmm hmhmm

needs following up if userDataDir is really functioning properly, or something deeper needs to happen like https://newbedev.com/puppeteer-how-to-store-a-session-including-cookies-page-state-local-storage-etc-and-continue-later

luckman212 commented 3 months ago

I tried to add an Amazon page to monitor for an in stock alert today and I think I am hitting this. Is there any working solution or workaround?

cdio v0.45.23

image

Also, it seems this completely hangs the container, once this happens it spins forever, and I can no longer access my cdio instance. I have to completely stop the docker container and restart it.

top output...

image

This spinner just spins forever...

image

edit: looks like this might be the same as https://github.com/dgtlmoon/changedetection.io/discussions/2396