burchill / webcomic_crawler

Written in Python 3, baby
0 stars 0 forks source link

Don't understand the "threading"... #1

Open andburch opened 7 years ago

andburch commented 7 years ago

Why do threading again? How does threading make downloading faster? Download everything and THEN do the analysis, like you said. And the whole "maxing our downloading" thing: why would threading mean we max out whatever it is you mean?

burchill commented 7 years ago

Threading makes downloading faster because you can download things in parallel instead of sequentially. Since we don't need to process anything with ordering constraints at first, we can have multiple threads downloading and linking pages all at once. Threading would be like trying pour water out of a bucket vs. a straw.

andburch commented 7 years ago

But who cares about speed? Like what kind of tones are we looking at?

On Mar 2, 2017 1:26 PM, "Zach Burchill" notifications@github.com wrote:

Threading makes downloading faster because you can download things in parallel instead of sequentially. Since we don't need to process anything with ordering constraints at first, we can have multiple threads downloading and linking pages all at once. Threading would be like trying pour water out of a bucket vs. a straw.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/burchill/webcomic_crawler/issues/1#issuecomment-283769921, or mute the thread https://github.com/notifications/unsubscribe-auth/AHlwVxL5Cw9oJ92VC4n6ME06do3LjErtks5rhyX0gaJpZM4MRab6 .

burchill commented 7 years ago

Theoretically, I'd like a ~1,000 comics. Think about loading every page in all of those comics. Like, we don't HAVE to do threading, but if you read what I posted, you can see that the structure is already there, so I don't see why we wouldn't. The threading stuff has been solved, you don't really need to learn how it works at more than a surface level: you just need to understand the BeautifulSoup stuff so you can scrape. The big problem right now is really just try to get the scraper to scrape by itself, which is an orthogonal issue to threading.

Also, upload a recording of you rolling the Spanish 'R' in the context of "rrah, arrah, perro" to Vocaroo.com and put the link here.