edsu / etudier

Extract a citation network from Google Scholar
161 stars 27 forks source link

returns just <HTML url='https://example.org/'> #28

Closed JanPastorek closed 1 year ago

JanPastorek commented 1 year ago

Hello,

I tried to run several google scholar links, but I always get just after extraction, where might be a problem?

This is written in console, when I run setup.py with some google scholar url USB: usb_device_handle_win.cc:1048 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)

I think it might be due to incompatibility of chrome, selenium and chromedriver. I am using Chrome 1.0.6 and corresponding ChromeDriver 1.0.6., and selenium==4.2.0

I tried also Chrome 94 and ChromeDriver 94 without any effect, still the same issue. What are the recommended ones to use?

Extraction of html however works on other websites, what might be the problem?

Thank you

JanPastorek commented 1 year ago

So I looked deeper, and function get_id always returns just None , probably they changed html and ids ?

Even deeper -- get_cluster_id returns always empty list

JanPastorek commented 1 year ago

In my case, requests-html did not work properly, ... could you add all the requirements with versions, and also version of python, Otherwise in the long run, this library becomes unusable. Thank you

edsu commented 1 year ago

It seems like you didn't have a working chromedriver. Sorry if that wasn't apparent!