dgtlmoon / changedetection.io

The best and simplest free open source web page change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monitor which websites had a text change for free. Free Open source web page change detection, Website defacement monitoring, Price change notification
https://changedetection.io
Apache License 2.0
17.3k stars 965 forks source link

module 'lxml.etree' has no attribute '_ElementStringResult' error since v0.45.18 #2312

Closed searchjaunt closed 6 months ago

searchjaunt commented 6 months ago

DO NOT USE THIS FORM TO REPORT THAT A PARTICULAR WEBSITE IS NOT SCRAPING/WATCHING AS EXPECTED

This form is only for direct bugs and feature requests todo directly with the software.

Please report watched websites (full URL and any settings) that do not work with changedetection.io as expected IN THE DISCUSSION FORUMS or your report will be deleted

CONSIDER TAKING OUT A SUBSCRIPTION FOR A SMALL PRICE PER MONTH, YOU GET THE BENEFIT OF USING OUR PAID PROXIES AND FURTHERING THE DEVELOPMENT OF CHANGEDETECTION.IO

THANK YOU

Describe the bug A huge amount of checks return module 'lxml.etree' has no attribute '_ElementStringResult'. Not all though, but the common factor with the errors is that website returning errors might have all an xpath filter. Not 100% sure though.

Version v0.45.18

To Reproduce

Steps to reproduce the behavior: Just do a check of a website with an xpath filter

! ALWAYS INCLUDE AN EXAMPLE URL WHERE IT IS POSSIBLE TO RE-CREATE THE ISSUE - USE THE 'SHARE WATCH' FEATURE AND PASTE IN THE SHARE-LINK!

Expected behavior No errors and showing the difference with the last check

Screenshots image

Desktop (please complete the following information): not applicable

Smartphone (please complete the following information): not applicable

Additional context Seems to be reported in https://forum.cloudron.io/topic/11456/module-lxml-etree-has-no-attribute-_elementstringresult too

dgtlmoon commented 6 months ago

Please run pip3 list and tell me what version of lxml you have

Also you didn't say if this install is a docker container or what it is...

On 17 April 2024 12:59:38 UTC, searchjaunt @.***> wrote:

DO NOT USE THIS FORM TO REPORT THAT A PARTICULAR WEBSITE IS NOT SCRAPING/WATCHING AS EXPECTED

This form is only for direct bugs and feature requests todo directly with the software.

Please report watched websites (full URL and any settings) that do not work with changedetection.io as expected IN THE DISCUSSION FORUMS or your report will be deleted

CONSIDER TAKING OUT A SUBSCRIPTION FOR A SMALL PRICE PER MONTH, YOU GET THE BENEFIT OF USING OUR PAID PROXIES AND FURTHERING THE DEVELOPMENT OF CHANGEDETECTION.IO

THANK YOU

Describe the bug A huge amount of checks return module 'lxml.etree' has no attribute '_ElementStringResult'. Not all though, but the common factor with the errors is that website returning errors might have all an xpath filter. Not 100% sure though.

Version v0.45.18

To Reproduce

Steps to reproduce the behavior: Just do a check of a website with an xpath filter

! ALWAYS INCLUDE AN EXAMPLE URL WHERE IT IS POSSIBLE TO RE-CREATE THE ISSUE - USE THE 'SHARE WATCH' FEATURE AND PASTE IN THE SHARE-LINK!

Expected behavior No errors and showing the difference with the last check

Screenshots image

Desktop (please complete the following information): not applicable

Smartphone (please complete the following information): not applicable

Additional context Seems to be reported in https://forum.cloudron.io/topic/11456/module-lxml-etree-has-no-attribute-_elementstringresult too

-- Reply to this email directly or view it on GitHub: https://github.com/dgtlmoon/changedetection.io/issues/2312 You are receiving this because you were assigned.

Message ID: @.***>

searchjaunt commented 6 months ago

Thx for the quick respons. Sorry for not mentioning it, but it runs in a Docker container indeed. A docker exec -it XXX pip3 list returns lxml 5.2.1

dgtlmoon commented 6 months ago

Ok I can reproduce it, it is limited to xpath1 queries only

xpath1:/html/head/title

dgtlmoon commented 6 months ago

https://github.com/dgtlmoon/changedetection.io/blob/d4dac23ba19c2d52c9d38f4ee6030cd95753e1f5/changedetectionio/html_tools.py#L175

In 5.1.1 lxml removed _ElementStringResult(), this was used to get the ->text() of a result https://github.com/dgtlmoon/changedetection.io/pull/778 https://github.com/dgtlmoon/changedetection.io/pull/751

searchjaunt commented 6 months ago

Thx for the investigation. Do you still need some information from my side? What is the next step?

dgtlmoon commented 6 months ago

@searchjaunt please paste me the exact selector you are using, visual-selector never generates text() type selectors afaik

searchjaunt commented 6 months ago

some random examples: xpath1://article[@class='page sticky grid gt-large'][1] image

xpath1://table[@id='wegenwerkendata'][1] xpath1://div[1]/div[2]/div[2] xpath1://div[3]/div[2]/div[1]/div[1]/div[1] xpath1://div[1]/section[1]/div[1]

I've never used the the visual selector

I have > 300 sites failing now.

dgtlmoon commented 6 months ago

Could you try 0.45.19 just released?

On 17 April 2024 18:15:55 UTC, searchjaunt @.***> wrote:

some random examples: @.***='page sticky grid gt-large'][1] image

@.***='wegenwerkendata'][1] xpath1://div[1]/div[2]/div[2] xpath1://div[3]/div[2]/div[1]/div[1]/div[1] xpath1://div[1]/section[1]/div[1]

I've never used the the visual selector

I have > 300 sites failing now.

-- Reply to this email directly or view it on GitHub: https://github.com/dgtlmoon/changedetection.io/issues/2312#issuecomment-2061924646 You are receiving this because you modified the open/close state.

Message ID: @.***>

dgtlmoon commented 6 months ago

xpath1://table[@id='wegenwerkendata'][1] xpath1://div[1]/div[2]/div[2] xpath1://div[3]/div[2]/div[1]/div[1]/div[1] xpath1://div[1]/section[1]/div[1]

note, this will only trigger if those elements are there, the error wont show otherwise

searchjaunt commented 6 months ago

I tried a couple of them and I'm getting the error can only concatenate str (not "bytes") to str image

now. PS not sure if I understand your latest note

dgtlmoon commented 6 months ago

if i had your exact selectors when you reported the bug, then i would not have released a new version without testing your selectors :( none-the-less, thanks... i'll keep working at it

dgtlmoon commented 6 months ago

@searchjaunt any chance you can grace me with what URL you are watching that causes that? really need the most info possible

searchjaunt commented 6 months ago

Sure, here are two of them (the one I tried returning the last error) https://stratenplan.gistel.be/gipod/wegeniswerken xpath1://table[@id='wegenwerkendata'][1] image

https://www.kortrijk.be/nieuws?f%5B0%5D=%3A&f%5B1%5D=categorie%3Amobiliteit xpath1://div[3]/div[2]/div[1]/div[1]/div[1] image

No other options or filtering and Basic fast Plaintext/HTTP Client (for the records, it does occur with WebDriver Chrome/Javascript, no Playwright/Chrome installed).

dgtlmoon commented 6 months ago

does changing it to //table[@id='wegenwerkendata'][1] work?

xconverge commented 6 months ago

here are 3 where I see these issues

https://www.amd.com/en/support/chipsets/amd-socket-am4/x570
xpath1:/html/body/div[1]/main/div/div/div/div/div[1]/div[1]/div/div[2]/details[1]/div/div[1]/div/span/div/div[2]
https://www.boss.info/global/support/by_product/katana-50_mk2/updates_drivers/4d633c80-f506-440e-94ce-055aaba48df3/
xpath1:/html/body/form/div[4]/div[1]/article/div[2]/div[2]
https://www.arturia.com/products/audio/minifuse/resources
xpath1:/html/body/div/div[1]/main/section[9]/div/div[4]/div[2]/div[2]/div[1]/table/tbody
xconverge commented 6 months ago

removing xpath1: from each has them working again I think

searchjaunt commented 6 months ago

@dgtlmoon that works indeed. I hope that the the conclusion won't be that I need to remove xpath1 for > 300 sites. I didn't add them myself but started appearing from a certain version of changedetection (can't recall which version). Why did it work before 0.45.18?

dgtlmoon commented 6 months ago

@searchjaunt "Why did it work before 0.45.18?" because as you said its a container and the container was built differently, thats how containers work

dgtlmoon commented 6 months ago

@dgtlmoon that works indeed. I hope that the the conclusion won't be that I need to remove xpath1 for > 300 sites. I didn't add them myself but started appearing from a certain version of changedetection (can't recall which version). Why did it work before 0.45.18?

if you gave me better examples to test with from the very start then this wouldnt have happened, it was only because i was missing exact information, usually i never start working on a bug until i have the exact data someone is using, but this time i did and it bit me

dgtlmoon commented 6 months ago

I re-tested all situations mentioned above (all URLs and filters) and in the newest 0.45.20 they all pass

please try that version (0.45.20)

searchjaunt commented 6 months ago

Just installed 0.45.20 and I still got an 'str' object has no attribute 'name' for https://www.depinte.be/werken //div[1]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[1]/div[1]

I explicitly removed xpath1:

other settings image

image

nothing else

Some other things: got some more false positives like image image Apart from the spacing (don't know where it comes from since the since wasn't changed) there is no difference.

Despite being up to date, I get the message that there is a new version available image

Constantin1489 commented 6 months ago

Hi, I made a mistake when I did xpath3.1. when I made "xpath:" to link elementpath lib(xpath3.1), I forgot to duplicate the original xpath1 with new "xpath1:" test. I'm currently investigating this xpath1 problem. I'm sorry.

EDIT: remove '//' in prefix

Constantin1489 commented 6 months ago

@searchjaunt I can't reproduce the 'str' object has no attribute 'name' with v0.45.20 image

Add other test result. image

searchjaunt commented 6 months ago

Still getting it though: image

dgtlmoon commented 6 months ago

Once again you

On 18 April 2024 12:21:49 UTC, searchjaunt @.***> wrote:

Still getting it though: image

-- Reply to this email directly or view it on GitHub: https://github.com/dgtlmoon/changedetection.io/issues/2312#issuecomment-2063738248 You are receiving this because you modified the open/close state.

Message ID: @.***>

searchjaunt commented 6 months ago

@dgtlmoon see https://github.com/dgtlmoon/changedetection.io/issues/2312#issuecomment-2063649841 Just tried deleting and creating it again, but with the same result

Constantin1489 commented 6 months ago

@searchjaunt Could you run this command? docker run -it -e LOGGER_LEVEL=CRITICAL --rm YOURCONTAINER_IMAGE bash -c 'pip3 list'

you can get the YOURCONTAINER_IMAGE(with the example image below mikebrady/shairport-sync:latest) of your running container with sudo docker ps. like this image

Constantin1489 commented 6 months ago

@searchjaunt Hi, I tried to reproduce the same thing with versions(18, 19, 20).. I couldn't reproduce 'str' object has no attribute 'name'

Screenshot 2024-04-18 at 22 32 44 Screenshot 2024-04-18 at 22 34 26 Screenshot 2024-04-18 at 22 37 17

searchjaunt commented 6 months ago

@Constantin1489 did you try the URL https://www.depinte.be/werken with the xpath //div[1]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[1]/div[1] and the other settings as mentioned in https://github.com/dgtlmoon/changedetection.io/issues/2312#issuecomment-2063649841

Constantin1489 commented 6 months ago

Yes! Screenshot 2024-04-18 at 23 33 15 Screenshot 2024-04-18 at 23 36 23 Screenshot 2024-04-18 at 23 38 52

Constantin1489 commented 6 months ago

@xconverge Also for the default xpath(XPath3.1).. That is why I didn't kill xpath1 and preserved the previous xpath syntax with 'xpath1:'

XPath3.1 function is important because when a user wants to use the syntax(xpath2~xpath3.1) obtained from SOF, in most cases, the person will fail. it's because lxml uses xpath1. also, python native xml xpath doesn't support all the syntax of xpath1. and Python native xml xpath is a little different than the XPath1 spec of W3C (especially namespace notation.).

I will soon publish the report repo about this subject(within two weeks? I'm cleaning codes now.). Spoiler alert! The number of tests is super huge. that shows why XPATH3.1 is possible without a problem in Python.(when the configuration is correct)

EDIT: So, basically there are pros and cons in xml or xpath parsers in Python. But the experience provided by elementpath lib is great because you can use xpath in the xpath spec without the problem.

searchjaunt commented 6 months ago

@Constantin1489 strange. So what can I do in order to debug/make it work? I find it rather strange that in the header is said that a new version is available whilst 20 is installed (see earlier screenshot).

Constantin1489 commented 6 months ago

@searchjaunt could you provide the command or script or dockerfile or docker-compose.yml how you run changedetectionio? Before posting here, please test the command you provide it actually works.

Also, Does the problem happen in all the watches?

navels commented 6 months ago

FYI I am also on 20 and am getting the "new version is available" banner. Installation is via this proxmox script: https://github.com/tteck/Proxmox/blob/main/ct/changedetection.sh

Constantin1489 commented 6 months ago

Ah sorry. I thought you were saying the syntax is not working. For the new version banner. that will disappear. @navels does your xpath1 syntax work?

dgtlmoon commented 6 months ago

Able to reproduce it with this shared watch https://changedetection.io/share/QtZ-94DW41sa on .20 , the error is actually now a different error 'str' object has no attribute '__name__'

When i use an earlier lxml version the error still exists so @searchjaunt this issue is unrelated, i will open a new one

dgtlmoon commented 6 months ago

Ok, this unrelated issue is now over at https://github.com/dgtlmoon/changedetection.io/issues/2318 thanks @Constantin1489

dgtlmoon commented 5 months ago

tldr - fixed :)