MatthewChatham / glassdoor-review-scraper

Scrape reviews from Glassdoor
BSD 2-Clause "Simplified" License
180 stars 252 forks source link

Fix reviews xpath bug when scrape URL #3

Closed yihaozhadan closed 5 years ago

MatthewChatham commented 6 years ago

@yihaozhadan Thanks so much for the PR! It's my first PR from someone else on GitHub ever. :)

Do you mind giving a description of the bug this fixed? Does the original xpath miss elements sometimes?

yihaozhadan commented 6 years ago

@MatthewChatham When I follow the instruction and run your first example. It thows error at line 349

 "//*[@id='EmpLinksWrapper']/div/a[2]"

So I open "https://www.glassdoor.com/Overview/Working-at-Wells-Fargo-EI_IE8876.11,22.htm" and inpect the web page source. I'm using Chrome browser and paste your xpath in the search bar. It is able to find the element until

"//*[@id='EmpLinksWrapper']/div".

In order to get reviews link, the correct xpath is

" //*[@id='EmpLinksWrapper']/div//a[2]"

One slash is missing in your code. You could try and reproduce the bug. After changing your code, the program is working as expected.

yihaozhadan commented 5 years ago

Good. You are welcome.

On Thu, Dec 13, 2018 at 1:14 AM Matthew Chatham notifications@github.com wrote:

Merged #3 https://github.com/MatthewChatham/glassdoor-review-scraper/pull/3 into master.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MatthewChatham/glassdoor-review-scraper/pull/3#event-2023655026, or mute the thread https://github.com/notifications/unsubscribe-auth/AESjR8KvkMUhO18h3M3XtFLG7bbQQBgvks5u4hpXgaJpZM4XwyD6 .