ClericPy / ichrome

Chrome controller for Humans, based on Chrome Devtools Protocol(CDP) and python3.7+.
https://pypi.org/project/ichrome/
MIT License
227 stars 29 forks source link

I think there is a bug in querySelectorAll() #134

Closed juanfrilla closed 1 year ago

juanfrilla commented 1 year ago

Depend on the scraped website it goes well, but now with another website it throws me this, using result = await tab.wait_tag_click("#tagyouwant") I print the items variable that throws me the error

2023-07-20 16:37:46 [torequests] DEBUG: Retry 0 times failed again: Cannot connect to host 127.0.0.1:51245 ssl:default [Connect call failed ('127.0.0.1', 51245)].
[{"tagName": "A", "innerHTML": "\n                                        <span class=\"gInlineBlock\">Búsqueda Avanzada</span><i class=\"material-icons gInlineBlock\">keyboard_arrow_left</i>", "outerHTML": "<a id=\"busqueda-avanzada\" href=\"javascript:void(0)\" class=\"gBlock gBorderBottom4 gPadding10-0 gColorGris\">\n                                        <span class=\"gInlineBlock\">Búsqueda Avanzada</span><i class=\"material-icons gInlineBlock\">keyboard_arrow_left</i></a>", "textContent": "\n                                        Búsqueda Avanzadakeyboard_arrow_left", "result": null, "attributes": {"id": "busqueda-avanzada", "href": "javascript:void(0)", "class": "gBlock gBorderBottom4 gPadding10-0 gColorGris"}}]
ERROR 2023-07-20 16:37:52 [ichrome] async_utils.py(1917): querySelectorAll error: TypeError('type object argument after ** must be a mapping, not str'), response: None
ERROR:ichrome:querySelectorAll error: TypeError('type object argument after ** must be a mapping, not str'), response: None
2023-07-20 16:37:52 [ichrome] ERROR: querySelectorAll error: TypeError('type object argument after ** must be a mapping, not str'), response: None
DEBUG:modulos.middlewares.middlewares:process_spider_exception
2023-07-20 16:37:53 [modulos.middlewares.middlewares] DEBUG: process_spider_exception
ERROR:scrapy.core.scraper:Spider error processing <GET https://servicio.indecopi.gob.pe/buscadorResoluciones/index.seam> (referer: None)
Traceback (most recent call last):
  File "/Users/juanfranciscomartinrodriguez/miniconda3/envs/p38/lib/python3.8/site-packages/twisted/internet/defer.py", line 824, in adapt
    extracted = result.result()
  File "/Users/juanfranciscomartinrodriguez/miniconda3/envs/p38/lib/python3.8/site-packages/scrapy/utils/py36.py", line 8, in collect_asyncgen
    async for x in result:
  File "/Users/juanfranciscomartinrodriguez/modulos/lib/modulos/pipelines/executor.py", line 23, in _process_pipelines
    async for item in parse(spider, *args, **kargs):
  File "/Users/juanfranciscomartinrodriguez/modulos/scrapy/latam/jurisprudenciapy3/jurisprudenciapy3/spiders/jPeruIndecopi.py", line 490, in parse
    await self.navigate_ichrome(1)
  File "/Users/juanfranciscomartinrodriguez/modulos/scrapy/latam/jurisprudenciapy3/jurisprudenciapy3/spiders/jPeruIndecopi.py", line 313, in navigate_ichrome
    result = await tab.wait_tag_click("#busqueda-avanzada")
  File "/Users/juanfranciscomartinrodriguez/miniconda3/envs/p38/lib/python3.8/site-packages/ichrome/async_utils.py", line 1606, in wait_tag_click
    tag = await self.wait_tag(cssselector,
  File "/Users/juanfranciscomartinrodriguez/miniconda3/envs/p38/lib/python3.8/site-packages/ichrome/async_utils.py", line 1635, in wait_tag
    tag = await self.querySelector(cssselector=cssselector,
  File "/Users/juanfranciscomartinrodriguez/miniconda3/envs/p38/lib/python3.8/site-packages/ichrome/async_utils.py", line 1814, in querySelector
    return await self.querySelectorAll(cssselector=cssselector,
  File "/Users/juanfranciscomartinrodriguez/miniconda3/envs/p38/lib/python3.8/site-packages/ichrome/async_utils.py", line 1919, in querySelectorAll
    raise error
  File "/Users/juanfranciscomartinrodriguez/miniconda3/envs/p38/lib/python3.8/site-packages/ichrome/async_utils.py", line 1908, in querySelectorAll
    result = [Tag(**kws) for kws in items]
  File "/Users/juanfranciscomartinrodriguez/miniconda3/envs/p38/lib/python3.8/site-packages/ichrome/async_utils.py", line 1908, in <listcomp>
    result = [Tag(**kws) for kws in items]
TypeError: type object argument after ** must be a mapping, not str
2023-07-20 16:37:53 [scrapy.core.scraper] ERROR: Spider error processing <GET https://servicio.indecopi.gob.pe/buscadorResoluciones/index.seam> (referer: None)
Traceback (most recent call last):
  File "/Users/juanfranciscomartinrodriguez/miniconda3/envs/p38/lib/python3.8/site-packages/twisted/internet/defer.py", line 824, in adapt
    extracted = result.result()
  File "/Users/juanfranciscomartinrodriguez/miniconda3/envs/p38/lib/python3.8/site-packages/scrapy/utils/py36.py", line 8, in collect_asyncgen
    async for x in result:
  File "/Users/juanfranciscomartinrodriguez/modulos/lib/modulos/pipelines/executor.py", line 23, in _process_pipelines
    async for item in parse(spider, *args, **kargs):
  File "/Users/juanfranciscomartinrodriguez/modulos/scrapy/latam/jurisprudenciapy3/jurisprudenciapy3/spiders/jPeruIndecopi.py", line 490, in parse
    await self.navigate_ichrome(1)
  File "/Users/juanfranciscomartinrodriguez/modulos/scrapy/latam/jurisprudenciapy3/jurisprudenciapy3/spiders/jPeruIndecopi.py", line 313, in navigate_ichrome
    result = await tab.wait_tag_click("#busqueda-avanzada")
  File "/Users/juanfranciscomartinrodriguez/miniconda3/envs/p38/lib/python3.8/site-packages/ichrome/async_utils.py", line 1606, in wait_tag_click
    tag = await self.wait_tag(cssselector,
  File "/Users/juanfranciscomartinrodriguez/miniconda3/envs/p38/lib/python3.8/site-packages/ichrome/async_utils.py", line 1635, in wait_tag
    tag = await self.querySelector(cssselector=cssselector,
  File "/Users/juanfranciscomartinrodriguez/miniconda3/envs/p38/lib/python3.8/site-packages/ichrome/async_utils.py", line 1814, in querySelector
    return await self.querySelectorAll(cssselector=cssselector,
  File "/Users/juanfranciscomartinrodriguez/miniconda3/envs/p38/lib/python3.8/site-packages/ichrome/async_utils.py", line 1919, in querySelectorAll
    raise error
  File "/Users/juanfranciscomartinrodriguez/miniconda3/envs/p38/lib/python3.8/site-packages/ichrome/async_utils.py", line 1908, in querySelectorAll
    result = [Tag(**kws) for kws in items]
  File "/Users/juanfranciscomartinrodriguez/miniconda3/envs/p38/lib/python3.8/site-packages/ichrome/async_utils.py", line 1908, in <listcomp>
    result = [Tag(**kws) for kws in items]
TypeError: type object argument after ** must be a mapping, not str
ClericPy commented 1 year ago

How can I reproduce this problem

juanfrilla commented 1 year ago

@ClericPy :

import asyncio

from ichrome import AsyncChromeDaemon

async def main():
    async with AsyncChromeDaemon(headless=True, disable_image=True) as cd:
        async with cd.incognito_tab() as tab:
            url = "https://servicio.indecopi.gob.pe/buscadorResoluciones/tribunal.seam"
            await tab.goto(url, timeout=3)
            result = await tab.wait_tag_click("#tagyouwant")

asyncio.run(main())
ClericPy commented 1 year ago

That website cannot be opened. I will try to reproduce it in other ways and see. Also, have you tried running it with headless=False? Does it work normally in that case? It seems the error is because the page is not being rendered. You can check in headless=False mode to see if the CSS is wrong or the page really isn't displaying.

juanfrilla commented 1 year ago

Okay, finally i solved it with the requests library haha, I guess it's something about the web-technologies which the page is implemented with.

ClericPy commented 1 year ago

ok. I tried times but never reproduce that.