carpentries-incubator / lc-webscraping

Introduction to web scraping
https://carpentries-incubator.github.io/lc-webscraping/
Other
37 stars 28 forks source link

Adjust lesson or links so directions match content in Episode 3 #58

Open ndporter opened 1 year ago

ndporter commented 1 year ago

Some updates were done to deal with changes to the Canadian Parliament webpages for Episode 3 in #29 but the lists the mailing addresses link for the Custom XPath Queries now forwards to a much less scrape-friendly version of the list (also with different members) that doesn't match any of the directions.

Either the directions and screenshots need updated to match the new page or an archived version of the old page can be used.

For the former, I don't know how to cleanly pull out the relevant information because it's not wrapped in its own tags but just element text as part of larger

and

sections.

For the latter, I was able to adapt the XPath when using an archive.org capture by changin //body/div[1]/div/ul to //div[4]/div/div/ul and then the rest of the commands worked. But you'd probably need to add something explaining web archiving and why we're doing it.

Either way, the lesson is currently broken beginning at Custom XPath Queries.

Not adding a PR both because I don't have time to develop the explanation/information on web archiving and can't solve it the other way.