dakrone / itsy

A threaded web-spider written in Clojure
181 stars 30 forks source link

Why aren't workers stopped when crawling stops? #8

Closed MarcoPolo closed 10 years ago

MarcoPolo commented 10 years ago

Not judging, just want to understand

dakrone commented 10 years ago

Since the state is exposed, it's possible to manually add URLs to the queue for the workers to crawl. Is not automatically stopping workers causing an issue for you?

MarcoPolo commented 10 years ago

I see.

It's a small issue when I don't really care about the state, just the final outcome of crawling. I think it would be neat to have an option passed in to stop crawling automatically. What do you think?

dakrone commented 10 years ago

I think that's totally reasonable and worthwhile.

MarcoPolo commented 10 years ago

Reading through the code, https://github.com/dakrone/itsy/blob/master/src/itsy/core.clj#L146 It seems like the worker does try to stop itself when the limit is reached. Am I missing something? Also the limit reach check (https://github.com/dakrone/itsy/blob/master/src/itsy/core.clj#L142) might be prone to a race condition.

dakrone commented 10 years ago

It seems like the worker does try to stop itself when the limit is reached.

It does, what I thought you were referring to was the thread not being joined from the parent via a watcher thread or some equivalent.

Also the limit reach check ... might be prone to a race condition.

It is, I should probably switch it to an AtomicInteger or use a different coordination mechanism.

MarcoPolo commented 10 years ago

The comment under usage says

;; number of urls to spider before crawling stops, note ;; that workers must still be stopped after crawling ;; stops. May be set to -1 to specify no limit. ;; (optional, defaults to 100)

So I thought I had to stop the workers when the limit was reached.

... switch it to an AtomicInteger or use a different coordination mechanism.

You could probably get by, by just switching to a >=.

dakrone commented 10 years ago

I fixed the race condition, going to close this as the threads are stopped.

MarcoPolo commented 10 years ago

Fantastic, thanks!