dgtlmoon / changedetection.io

The best and simplest free open source web page change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monitor which websites had a text change for free. Free Open source web page change detection, Website defacement monitoring, Price change notification
https://changedetection.io
Apache License 2.0
19.53k stars 1.05k forks source link

[feature] populate the restock & price column with xpath filtered result #2707

Open starfishbzdf opened 1 month ago

starfishbzdf commented 1 month ago

Version and OS v0.46.04 on docker

Is your feature request related to a problem? Please describe. when re-stock & price auto detection fails, i resort back to xpath filter that points straight to the price - while this works as expected, it does not show up in the restock & price column in the home page.

Describe the solution you'd like maybe a checkmark next to the xpath that declares that this is a price of a single product, then it can be displayed (even without stock information, that's fine) in the column. alternatively, when auto-detection fails have somewhere to manually enter the xpath for the price.

Describe the use-case and give concrete real-world examples alright here's a single-product page of a Fortigate 40F firewall that's not being picked up by the automatic re-stock & price detection feature: https://baypo.co.il/product/FG-40F/ the error says:

Unable to extract restock data for this page unfortunately. (Got code 200 from server), no embedded stock information was found and nothing interesting in the text, try using this watch with Chrome.

which probably falls on that site implementing their store badly. happens. so i highlight the price and right click to inspect, copy the xpath:

/html/body/div[3]/section[1]/div/div[2]/div/div[5]/div/span

put it in the CSS/JSONPath/JQ/XPath Filters text box and voila, simply follows the changes in price and nothing else. (you know how great your project is but it doesn't hurt to praise it)

now it would be nice if it could show up in the column like the rest of the products i follow image

mechanarchy commented 3 weeks ago

Agree, this feature would be an amazing addition. I have quite a few entries that aren't correctly parsed by the restock/price detector, but a manual XPath rule can pull it out and format it however I like.

Having textboxes to enter "XPath for in-stock" and "XPath for price" would be ideal, but at minimum if we could just run the optionally run the stock/price parsing after standard change detection and filters that would solve the issue.

Example

Page url: https://www.bunnings.com.au/glitz-5l-citrus-dishwashing-liquid_p4465539 My XPath filters:

xpath:concat(//meta[@property="og:title"]/@content, "<br>")
xpath:concat(//p[@data-locator="product-price"], "<br>")
xpath://p[@data-locator="product-price-comparison"]/concat(., " ", ../*[2])

Rendered result:

Glitz 5L Citrus Dishwashing Liquid
$13.12
$2.62 per litre

Doesn't show the stock level, obviously, but certainly pricing can be easily pulled out. So a tick-box to check price now solves this issue.

denilsonsa commented 1 week ago

By looking at processors/restock_diff/processor.py, we can see it detects prices by trying to read data from one of the formats supported by extruct.

Thus, we can work-around this limitation by injecting our own JavaScript code that gets executed before the processor runs.

  1. Edit the item that cannot yet detect the correct price.
  2. At the "General" tab, choose "Re-stock & Price detection for single product pages".
  3. At the "Request" tab, choose "Playwright Chromium/JavaScript".
  4. At the "Request" tab, click on the "Show advanced options" button.
  5. At the "Execute JavaScript before change detection" box, add your custom script.

For instance, this one works for a major online site:

var s = document.createElement('script');
s.type = 'application/ld+json';
s.textContent = JSON.stringify({
  "matches": Array.from(document.querySelectorAll('#apex_desktop .a-price.priceToPay')).map(el => {
    let curr = el.querySelector('.a-price-symbol');
    let whole = el.querySelector('.a-price-whole');
    let frac = el.querySelector('.a-price-fraction');
    return {
      "currency": curr?.textContent.trim(),
      "price": whole?.textContent.trim().replace(/[^0-9]/g, '') + '.' + (frac?.textContent.trim() || '00'),
    };
  }),
});
document.body.appendChild(s);

I'm pretty sure someone can come up with simpler code.


Still, it could be much easier if we could nudge the processor into looking at the right elements. If we could tell the processor to just look at the text content of certain elements matching a CSS selector or XPath, that would be easier. That, however, also means the processor needs to understand multiple locales, being able to extract the currency from the number, and being able to properly parse the number regardless of the decimal separator.