Closed searchjaunt closed 7 months ago
Please run pip3 list
and tell me what version of lxml you have
Also you didn't say if this install is a docker container or what it is...
On 17 April 2024 12:59:38 UTC, searchjaunt @.***> wrote:
DO NOT USE THIS FORM TO REPORT THAT A PARTICULAR WEBSITE IS NOT SCRAPING/WATCHING AS EXPECTED
This form is only for direct bugs and feature requests todo directly with the software.
Please report watched websites (full URL and any settings) that do not work with changedetection.io as expected IN THE DISCUSSION FORUMS or your report will be deleted
CONSIDER TAKING OUT A SUBSCRIPTION FOR A SMALL PRICE PER MONTH, YOU GET THE BENEFIT OF USING OUR PAID PROXIES AND FURTHERING THE DEVELOPMENT OF CHANGEDETECTION.IO
THANK YOU
Describe the bug A huge amount of checks return module 'lxml.etree' has no attribute '_ElementStringResult'. Not all though, but the common factor with the errors is that website returning errors might have all an xpath filter. Not 100% sure though.
Version v0.45.18
To Reproduce
Steps to reproduce the behavior: Just do a check of a website with an xpath filter
! ALWAYS INCLUDE AN EXAMPLE URL WHERE IT IS POSSIBLE TO RE-CREATE THE ISSUE - USE THE 'SHARE WATCH' FEATURE AND PASTE IN THE SHARE-LINK!
Expected behavior No errors and showing the difference with the last check
Screenshots
Desktop (please complete the following information): not applicable
Smartphone (please complete the following information): not applicable
Additional context Seems to be reported in https://forum.cloudron.io/topic/11456/module-lxml-etree-has-no-attribute-_elementstringresult too
-- Reply to this email directly or view it on GitHub: https://github.com/dgtlmoon/changedetection.io/issues/2312 You are receiving this because you were assigned.
Message ID: @.***>
Thx for the quick respons. Sorry for not mentioning it, but it runs in a Docker container indeed. A docker exec -it XXX pip3 list returns lxml 5.2.1
Ok I can reproduce it, it is limited to xpath1
queries only
xpath1:/html/head/title
In 5.1.1 lxml
removed _ElementStringResult()
, this was used to get the ->text()
of a result https://github.com/dgtlmoon/changedetection.io/pull/778 https://github.com/dgtlmoon/changedetection.io/pull/751
Thx for the investigation. Do you still need some information from my side? What is the next step?
@searchjaunt please paste me the exact selector you are using, visual-selector never generates text()
type selectors afaik
some random examples: xpath1://article[@class='page sticky grid gt-large'][1]
xpath1://table[@id='wegenwerkendata'][1] xpath1://div[1]/div[2]/div[2] xpath1://div[3]/div[2]/div[1]/div[1]/div[1] xpath1://div[1]/section[1]/div[1]
I've never used the the visual selector
I have > 300 sites failing now.
Could you try 0.45.19 just released?
On 17 April 2024 18:15:55 UTC, searchjaunt @.***> wrote:
some random examples: @.***='page sticky grid gt-large'][1]
@.***='wegenwerkendata'][1] xpath1://div[1]/div[2]/div[2] xpath1://div[3]/div[2]/div[1]/div[1]/div[1] xpath1://div[1]/section[1]/div[1]
I've never used the the visual selector
I have > 300 sites failing now.
-- Reply to this email directly or view it on GitHub: https://github.com/dgtlmoon/changedetection.io/issues/2312#issuecomment-2061924646 You are receiving this because you modified the open/close state.
Message ID: @.***>
xpath1://table[@id='wegenwerkendata'][1] xpath1://div[1]/div[2]/div[2] xpath1://div[3]/div[2]/div[1]/div[1]/div[1] xpath1://div[1]/section[1]/div[1]
note, this will only trigger if those elements are there, the error wont show otherwise
I tried a couple of them and I'm getting the error can only concatenate str (not "bytes") to str
now. PS not sure if I understand your latest note
if i had your exact selectors when you reported the bug, then i would not have released a new version without testing your selectors :( none-the-less, thanks... i'll keep working at it
@searchjaunt any chance you can grace me with what URL you are watching that causes that? really need the most info possible
Sure, here are two of them (the one I tried returning the last error) https://stratenplan.gistel.be/gipod/wegeniswerken xpath1://table[@id='wegenwerkendata'][1]
https://www.kortrijk.be/nieuws?f%5B0%5D=%3A&f%5B1%5D=categorie%3Amobiliteit xpath1://div[3]/div[2]/div[1]/div[1]/div[1]
No other options or filtering and Basic fast Plaintext/HTTP Client (for the records, it does occur with WebDriver Chrome/Javascript, no Playwright/Chrome installed).
does changing it to //table[@id='wegenwerkendata'][1]
work?
here are 3 where I see these issues
https://www.amd.com/en/support/chipsets/amd-socket-am4/x570
xpath1:/html/body/div[1]/main/div/div/div/div/div[1]/div[1]/div/div[2]/details[1]/div/div[1]/div/span/div/div[2]
https://www.boss.info/global/support/by_product/katana-50_mk2/updates_drivers/4d633c80-f506-440e-94ce-055aaba48df3/
xpath1:/html/body/form/div[4]/div[1]/article/div[2]/div[2]
https://www.arturia.com/products/audio/minifuse/resources
xpath1:/html/body/div/div[1]/main/section[9]/div/div[4]/div[2]/div[2]/div[1]/table/tbody
removing xpath1: from each has them working again I think
@dgtlmoon that works indeed. I hope that the the conclusion won't be that I need to remove xpath1 for > 300 sites. I didn't add them myself but started appearing from a certain version of changedetection (can't recall which version). Why did it work before 0.45.18?
@searchjaunt "Why did it work before 0.45.18?" because as you said its a container and the container was built differently, thats how containers work
@dgtlmoon that works indeed. I hope that the the conclusion won't be that I need to remove xpath1 for > 300 sites. I didn't add them myself but started appearing from a certain version of changedetection (can't recall which version). Why did it work before 0.45.18?
if you gave me better examples to test with from the very start then this wouldnt have happened, it was only because i was missing exact information, usually i never start working on a bug until i have the exact data someone is using, but this time i did and it bit me
I re-tested all situations mentioned above (all URLs and filters) and in the newest 0.45.20
they all pass
please try that version (0.45.20)
Just installed 0.45.20 and I still got an 'str' object has no attribute 'name' for https://www.depinte.be/werken //div[1]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[1]/div[1]
I explicitly removed xpath1:
other settings
nothing else
Some other things: got some more false positives like Apart from the spacing (don't know where it comes from since the since wasn't changed) there is no difference.
Despite being up to date, I get the message that there is a new version available
Hi, I made a mistake when I did xpath3.1. when I made "xpath:" to link elementpath lib(xpath3.1), I forgot to duplicate the original xpath1 with new "xpath1:" test. I'm currently investigating this xpath1 problem. I'm sorry.
EDIT: remove '//' in prefix
@searchjaunt I can't reproduce the 'str' object has no attribute 'name'
with v0.45.20
Add other test result.
Still getting it though:
Once again you
On 18 April 2024 12:21:49 UTC, searchjaunt @.***> wrote:
Still getting it though:
-- Reply to this email directly or view it on GitHub: https://github.com/dgtlmoon/changedetection.io/issues/2312#issuecomment-2063738248 You are receiving this because you modified the open/close state.
Message ID: @.***>
@dgtlmoon see https://github.com/dgtlmoon/changedetection.io/issues/2312#issuecomment-2063649841 Just tried deleting and creating it again, but with the same result
@searchjaunt Could you run this command?
docker run -it -e LOGGER_LEVEL=CRITICAL --rm YOURCONTAINER_IMAGE bash -c 'pip3 list'
you can get the YOURCONTAINER_IMAGE
(with the example image below mikebrady/shairport-sync:latest
) of your running container with sudo docker ps
.
like this
@searchjaunt Hi, I tried to reproduce the same thing with versions(18, 19, 20).. I couldn't reproduce 'str' object has no attribute 'name'
@Constantin1489 did you try the URL https://www.depinte.be/werken with the xpath //div[1]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[1]/div[1] and the other settings as mentioned in https://github.com/dgtlmoon/changedetection.io/issues/2312#issuecomment-2063649841
Yes!
@xconverge Also for the default xpath(XPath3.1).. That is why I didn't kill xpath1 and preserved the previous xpath syntax with 'xpath1:'
XPath3.1 function is important because when a user wants to use the syntax(xpath2~xpath3.1) obtained from SOF, in most cases, the person will fail. it's because lxml uses xpath1. also, python native xml xpath doesn't support all the syntax of xpath1. and Python native xml xpath is a little different than the XPath1 spec of W3C (especially namespace notation.).
I will soon publish the report repo about this subject(within two weeks? I'm cleaning codes now.). Spoiler alert! The number of tests is super huge. that shows why XPATH3.1 is possible without a problem in Python.(when the configuration is correct)
EDIT: So, basically there are pros and cons in xml or xpath parsers in Python. But the experience provided by elementpath lib is great because you can use xpath in the xpath spec without the problem.
@Constantin1489 strange. So what can I do in order to debug/make it work? I find it rather strange that in the header is said that a new version is available whilst 20 is installed (see earlier screenshot).
@searchjaunt could you provide the command or script or dockerfile or docker-compose.yml how you run changedetectionio? Before posting here, please test the command you provide it actually works.
Also, Does the problem happen in all the watches?
FYI I am also on 20 and am getting the "new version is available" banner. Installation is via this proxmox script: https://github.com/tteck/Proxmox/blob/main/ct/changedetection.sh
Ah sorry. I thought you were saying the syntax is not working. For the new version banner. that will disappear. @navels does your xpath1 syntax work?
Able to reproduce it with this shared watch https://changedetection.io/share/QtZ-94DW41sa on .20 , the error is actually now a different error 'str' object has no attribute '__name__'
When i use an earlier lxml version the error still exists so @searchjaunt this issue is unrelated, i will open a new one
Ok, this unrelated issue is now over at https://github.com/dgtlmoon/changedetection.io/issues/2318 thanks @Constantin1489
tldr - fixed :)
DO NOT USE THIS FORM TO REPORT THAT A PARTICULAR WEBSITE IS NOT SCRAPING/WATCHING AS EXPECTED
This form is only for direct bugs and feature requests todo directly with the software.
Please report watched websites (full URL and any settings) that do not work with changedetection.io as expected IN THE DISCUSSION FORUMS or your report will be deleted
CONSIDER TAKING OUT A SUBSCRIPTION FOR A SMALL PRICE PER MONTH, YOU GET THE BENEFIT OF USING OUR PAID PROXIES AND FURTHERING THE DEVELOPMENT OF CHANGEDETECTION.IO
THANK YOU
Describe the bug A huge amount of checks return module 'lxml.etree' has no attribute '_ElementStringResult'. Not all though, but the common factor with the errors is that website returning errors might have all an xpath filter. Not 100% sure though.
Version v0.45.18
To Reproduce
Steps to reproduce the behavior: Just do a check of a website with an xpath filter
! ALWAYS INCLUDE AN EXAMPLE URL WHERE IT IS POSSIBLE TO RE-CREATE THE ISSUE - USE THE 'SHARE WATCH' FEATURE AND PASTE IN THE SHARE-LINK!
Expected behavior No errors and showing the difference with the last check
Screenshots
Desktop (please complete the following information): not applicable
Smartphone (please complete the following information): not applicable
Additional context Seems to be reported in https://forum.cloudron.io/topic/11456/module-lxml-etree-has-no-attribute-_elementstringresult too