18F / tech-talks

Suggestions, schedules, and other information about the Engineering Chapter's Tech Talk meetings.
https://github.com/18F/tech-talks/
Other
28 stars 7 forks source link

Tech Talk: Using Web Crawlers #62

Open geekygirlsarah opened 2 years ago

geekygirlsarah commented 2 years ago

Tech Talk Submission

Thanks for offering to give a talk at a Tech Talks meeting! We just need a bit of information from you.

Name

Sarah Withee

What's your talk title?

Using Web Crawlers (insert more wittier title later)

What's your talk about?

For a PA I was working on, I couldn't get access to some of the sites we were trying to revamp to get the necessary data sets I needed. On a whim, I decided to try to look into web archiving tools, and through that research came upon the idea of using Scrapy (in Python) to scrape the sites for the info I needed.

I wrote a variety of small "spiders" to crawl across the 10 websites from the partner agency and was able to gather massive lists of things we had questions about. I wanted to share how that Scrapy works, but also other ways it can be used in ways that you might not necessarily have thought of. I'll also cover some of the issues that came up and how to overcome them.

How long is your talk?

Do you have any preferred dates for it?

No. Use this as a potential backup in case a talk falls through or there's no talk for that week.

Todo for the MC:

julialeague commented 2 years ago