dgtlmoon / changedetection.io

The best and simplest free open source web page change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monitor which websites had a text change for free. Free Open source web page change detection, Website defacement monitoring, Price change notification
https://changedetection.io
Apache License 2.0
17.08k stars 958 forks source link

regex issues after recheck / output gone #2448

Closed th12289 closed 2 months ago

th12289 commented 3 months ago

Describe the bug

regex issues after recheck / output gone

Version v0.45.24

To Reproduce

Steps to reproduce the behavior:

I am tracking for example the follwing url: https://www.otto.de/mode/sportmode/sportschuhe/laufschuhe/stabilitaetsschuhe/?marke=asics,asics-sportstyle&schuhgroesse-eu=46&zielgruppe=herren&sortiertnach=preis-aufsteigend

Filters and triggers CSS/JSONPath/JQ/XPath Filters

//*[@id="reptile-tilelist"]

and nothing anything else. just the following regex:

/.*laufschuh.*|.*€.*|.*extra.*/

thats my output if i clear history and this is the correct one:

    Asics GT-1000 13 Laufschuh für mehr Stabilität
      129,99 € 11,87 € mtl. in 12 Raten
          -20% Extra
      129,99 € 11,87 € mtl. in 12 Raten
          -20% Extra
    Asics GT-2000 12 Laufschuh für mehr Stabilität
      159,99 € 14,61 € mtl. in 12 Raten
          -20% Extra
      159,99 € 14,61 € mtl. in 12 Raten
          -20% Extra
    Asics GEL-KAYANO 31 Laufschuh für mehr Stabilität
      199,99 € 18,27 € mtl. in 12 Raten
      199,99 € 18,27 € mtl. in 12 Raten

after i push "recheck" without changing anything my output is nothing. This happens on many of my configured sites.

Expected behavior Same output as before (or changes if website had changes)

Screenshots

output after clearing history image

output after recheck is nothing

Desktop (please complete the following information):

Additional context If there are more as one regex entrys, there are sometimes no issues with line 2,3,4.... its just with the first one.

example:

        /.*laufschuh.*|.*€.*|.*extra.*/
        /.*€.*/

Also if i use the first regex and duplicate it, that there are 2 times the same entrys, it works sometimes but with a doubled output.

output after duplicated entry image image

dgtlmoon commented 3 months ago

Additional context If there are more as one regex entrys, there are sometimes no issues with line 2,3,4.... its just with the first one.

example:

    /.*laufschuh.*|.*€.*|.*extra.*/
    /.*€.*/

what field exactly are you putting this into?

th12289 commented 3 months ago

Hello @dgtlmoon

first thank you for answering. I have collected some more information to clarify.

i use Extract text and i have changed "laufschuh" to "gel"

image

Before i was going into my bed, all rules were resetted with "clear/reset history" and it was correct first.

image

These were the changes over night: (theres a last row with sidenotes)

URL CSS/JSONPath/JQ/XPath Filters PDF Print of Edit Changes over night with first correct output sidenote
OWL Dampfer //*[@id="product-list"] owldampfer.pdf owldampfer_changes.pdf Just 4 lines left
Asics Outlet Neutral /html/body/div[2]/div[4]/div[4]/div[2]/div[5]/ul/li[*]/a asicsoutletneutral.pdf asicsoutletneutral_changes.pdf removed everything
Otto //*[@id="reptile-tilelist"] otto.pdf otto_changes.pdf removed everything
Zalando none zalando.pdf zalando_changes.pdf collects 1 of 3, but after reset history it is/was correct
Size Official none sizeofficial.pdf sizeofficial_changes.pdf removed everything
Asics Outlet Stable /html/body/div[2]/div[4]/div[4]/div[2]/div[5]/ul/li[*]/a asicsoutletstable.pdf asicsoutletstable_changes.pdf removed everything

Example images of URL 1-3 image image image

Example Configurations without changes over night and right output

Picksport Asics URL picksport_asics_edit_page

image

Picksport Ghost URL picksport_ghost_editpage.pdf

image

th12289 commented 3 months ago

I don't know if this helps, but it seems like it reduces after time. I have changed nothing. Just my hourly checks worked.

Example: https://www.owl-dampfer.de/OWL-SALT?af=100

image image


First output image

Output 2 hours later image

Output 8 hours later image

image

th12289 commented 3 months ago

some more information:

i hope this helps a bit :)

thank you so much for your work

dgtlmoon commented 3 months ago

if you change from xpath to css, does it still happen?

so from //*[@id="reptile-tilelist"] to #reptile-tilelist

th12289 commented 2 months ago

Hello @dgtlmoon

thank you for answering.

Short answer: yes, it still happen

i changed to

image


First Monitored Page

https://www.otto.de/mode/sportmode/sportschuhe/laufschuhe/stabilitaetsschuhe/?marke=asics,asics-sportstyle&schuhgroesse-eu=46&zielgruppe=herren&sortiertnach=preis-aufsteigend

image

Resetted History First output in preview was okay

Changes

Next - Removed entrys image

Next - Added entrys image


Second Monitored Page

https://www.otto.de/mode/sportmode/sportschuhe/laufschuhe/neutralschuhe/?marke=asics,asics-sportstyle&preis-in-eur~bis=100&reduziert&schuhgroesse-eu=46&zielgruppe=herren&sortiertnach=preis-aufsteigend

Resetted History First output in preview was okay

Changes - Removed entrys

image

Next - Nothing

image

Next - Nothing

image

dgtlmoon commented 2 months ago

Can you click the 'share' link and paste that link here?, it will be easier than trying to copy/paste/understand your settings exactly from what you wrote

th12289 commented 2 months ago

Yes, i've opened my instance public to share. Sorry and thank you again.


Otto Neutral:

https://changedetection.tim-hallinger.de/diff/de6b2d07-91a0-4117-bad9-f515a4918951#text

Settings image image image


Otto Stable

https://changedetection.tim-hallinger.de/diff/52dee9ae-2778-4d28-93b0-b5aa452013b3#text

Settings image image image

dgtlmoon commented 2 months ago

err thats not what i meant... at all

I mean this image

I will change this to 'discussions', the regex's work and i cannot reproduce it