when the html fetched contains HTML Entities,pyquery would not work correctly .And that's why the pull request comes into being.
But,suprised,i find you did the same thing in the commit df8b4d7da5687e87334723be0834b0b1d6190530.
I am confused that you delete that line in the commit e3ee18a732b638a64da228ca54a8db45bdb06be2 ,howerver. And you add url = unescape(url) because of the code parsers = [Parser('http://blog.sciencenet.cn/home.php\?mod=space&uid=\d+&do=blog&view=me&from=space&page=\d+'), Parser('blog\-\d+\-\d+\.html', Post)] contains HTML Entities like &.
So,i do confused why you did that.If unescape the whole html, not only pyquery would work fine,but also needn't to change parsers = [Parser('http://blog.sciencenet.cn/home.php\?mod=space&uid=\d+&do=blog&view=me&from=space&page=\d+'),toparsers = [Parser('http://blog.sciencenet.cn/home.php\?mod=space&uid=\d+&do=blog&view=me&from=space&page=\d+'), Parser('blog\-\d+\-\d+\.html', Post)] as we are used to write the former code.
As a undergraduate students ,Maybe there are some occasions i don't take into account or i'm wrong.
By the way,i opened an issue lists my problem.Could you help me out?
when the html fetched contains HTML Entities,pyquery would not work correctly .And that's why the pull request comes into being.
But,suprised,i find you did the same thing in the commit df8b4d7da5687e87334723be0834b0b1d6190530. I am confused that you delete that line in the commit e3ee18a732b638a64da228ca54a8db45bdb06be2 ,howerver. And you add
url = unescape(url)
because of the codeparsers = [Parser('http://blog.sciencenet.cn/home.php\?mod=space&uid=\d+&do=blog&view=me&from=space&page=\d+'), Parser('blog\-\d+\-\d+\.html', Post)]
contains HTML Entities like &.So,i do confused why you did that.If unescape the whole html, not only pyquery would work fine,but also needn't to change
parsers = [Parser('http://blog.sciencenet.cn/home.php\?mod=space&uid=\d+&do=blog&view=me&from=space&page=\d+'),
toparsers = [Parser('http://blog.sciencenet.cn/home.php\?mod=space&uid=\d+&do=blog&view=me&from=space&page=\d+'), Parser('blog\-\d+\-\d+\.html', Post)]
as we are used to write the former code.As a undergraduate students ,Maybe there are some occasions i don't take into account or i'm wrong.
By the way,i opened an issue lists my problem.Could you help me out?