dgtlmoon / changedetection.io

The best and simplest free open source web page change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monitor which websites had a text change for free. Free Open source web page change detection, Website defacement monitoring, Price change notification
https://changedetection.io
Apache License 2.0
20.05k stars 1.09k forks source link

[feature] Extract html attribute and make it available as a token for notification #2254

Open remz1337 opened 8 months ago

remz1337 commented 8 months ago

Version and OS 0.45.16 on Debian 12 (LXC in Proxmox)

Is your feature request related to a problem? Please describe. No

Describe the solution you'd like When a change is detected, retrieve a specific attribute value given the xpath of the HTML element. Make this value accessible as a token to include in the body of a notification.

Describe the use-case and give concrete real-world examples I'm trying to know when a lottery opens, based on the "See products" span element. This already works. The feature I would like is to extract the href value (/en/new-products/march-2024) of the parent a element and embed it in the notification. This would save me a few clicks by accessing the link directly in the notification.

<div data-content-type="buttons" data-appearance="inline" data-same-width="false" data-element="main" data-pb-style="DKTBLDL">
  <div data-content-type="button-item" data-appearance="default" data-element="main" data-pb-style="E4OTI38">
    <a class="pagebuilder-button-primary" href="/en/new-products/march-2024" target="" data-link-type="default" data-element="link" data-pb-style="C6GIA0S">
      <span data-element="link_text">See products</span>
    </a>
  </div>
</div>

taken from here

Additional context Add any other context or screenshots about the feature request here.

dgtlmoon commented 8 months ago

Yes - I was thinking that you should be able to query the HTML in the "notification body" since it can also use Jinja2 templating

for example - the template could be

<h1>Page changed!</h1>
{{ query("//a@href") }}
dgtlmoon commented 8 months ago

It should save the HTML on each fetch too

kanjieater commented 7 months ago

Saw this as well as a reddit post but still new to how you would use the jinja templates from the crawled elements or multiple elements to display in a notification. Could we have a wiki article to document how this would work?

MoralCode commented 6 months ago

Id potentially be interested enough in this to contribute to the fix if its something that's within the realm of what a new contributor could add.

dgtlmoon: if you have time, could you maybe let me know if you think this could be doable for a new-to-this-codebase contributor with fairly strong general knowledge of python (and programming in general)? Would be helpful to have a general roadmap of what you think might need changing in what parts of the codebase to help me get started (if you think its doable)?

dgtlmoon commented 6 months ago

@remz1337

The feature I would like is to extract the href value

You can turn on the "Render anchor tag content" option which has been there since the start

image

or use //a@href etc in your filters

remz1337 commented 6 months ago

@dgtlmoon yes was able to achieve what I wanted with that. Trigger the notification on href change and extract only the href value to pass as current_snapshot in the notification body. Thanks

However, extracting the href value is a bit tricky. It worked in my case since I know the last character is always going to be a digit, so I used this regex in my filter to extract the value /\/.*\d/

dgtlmoon commented 6 months ago

Related to #2373 (save HTML snapshots)

dgtlmoon commented 4 months ago

2373 is merged, so this can continue :)