diyclassics / index_indicum

0 stars 1 forks source link

Write code to get author data from Papers #2

Closed diyclassics closed 6 years ago

diyclassics commented 6 years ago

Need to write code that...

Then code that...

FMezard commented 6 years ago

@diyclassics, I am trying to use requests-html to do so, but I have an error that I can not get past : ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration. when trying to use it in that code :

session = HTMLSession()
    r = session.get('http://dlib.nyu.edu/awdl/isaw/isaw-papers/13/')
    print(r)
    author = r.html.absolute_links

I think that it is because there is the encoding declaration at the beginning of the file <?xml version="1.0" encoding="UTF-8"?> Can I do something about it ? Otherwise I can use BeautifulSoup since I know how to use it, but I think that you prefer html-requests.

diyclassics commented 6 years ago

I see that—frustrating. Let's still use requests, but parse the xml ourselves. See the following gist: https://gist.github.com/diyclassics/feb754ceb982d85a272248692e92405d. (Btw you may need to install requests and lxml.)