Data4Democracy / assemble

NOT AN ACTIVE PROJECT -- Check readme for data sources
MIT License
36 stars 27 forks source link

Strip youtube ID from youtube links #54

Closed bstarling closed 7 years ago

bstarling commented 7 years ago

The problem:

A goal of an progress data pipeline is to extract youtube links found in text blobs then poll youtube API to to get additional video metadata. In order to poll API we need to extract the youtube video ID from the URL.

Tasks

Additional Info

The base case is links will look like https://www.youtube.com/watch?v=DiTECkLZ8HM the youtube ID is DiTECkLZ8HM. Create a csv file with two columns original_url, youtube_id.

Links will come in many formats.
Some examples:

A sample of 40,000 URLs to be used for testing purposes can be found here

Warning: this work requires you deal with highly explicit and offensive content from the pol 4chan board.

josephpd3 commented 7 years ago

@bstarling I'd love to work on this this weekend, though I'm not available until later Friday night and Saturday night. Is this reserved for full-time participants in the Hackathon, or can I take a crack at it?

bstarling commented 7 years ago

@josephpd3 that is no problem. The hackathon is remote / asynchronous so you are welcome to tackle it this weekend. My only request if you end up not having time just come back and let us know so we can free it up for someone else to work on. Feel free to drop in chat if you want to see if anyone else is interested in working with you.

bstarling commented 7 years ago

Closed with #57