Closed bstarling closed 7 years ago
@bstarling I'd love to work on this this weekend, though I'm not available until later Friday night and Saturday night. Is this reserved for full-time participants in the Hackathon, or can I take a crack at it?
@josephpd3 that is no problem. The hackathon is remote / asynchronous so you are welcome to tackle it this weekend. My only request if you end up not having time just come back and let us know so we can free it up for someone else to work on. Feel free to drop in chat if you want to see if anyone else is interested in working with you.
Closed with #57
The problem:
A goal of an progress data pipeline is to extract youtube links found in text blobs then poll youtube API to to get additional video metadata. In order to poll API we need to extract the youtube video ID from the URL.
Tasks
Additional Info
The base case is links will look like
https://www.youtube.com/watch?v=DiTECkLZ8HM
the youtube ID is DiTECkLZ8HM. Create a csv file with two columnsoriginal_url
,youtube_id
.Links will come in many formats.
Some examples:
A sample of 40,000 URLs to be used for testing purposes can be found here
Warning: this work requires you deal with highly explicit and offensive content from the pol 4chan board.