Each worker writes data to the metadata graph (or metadata file) while crawling. However, since the worker is never really done, it will always have an open connection or a buffer for the metadata graph. This can easily lead to a situation in which a worker is stopped and remaining metadata that has not been written is lost.
Fix
Add a method to the sink that does not close the sink for a URI but flushes it (i.e., all buffered data is written). Something like flush(CrawlableUri uri).
The worker should flush the metadata graph every time it finishes crawling a URI and writes a crawling activity.
Situation
Each worker writes data to the metadata graph (or metadata file) while crawling. However, since the worker is never really done, it will always have an open connection or a buffer for the metadata graph. This can easily lead to a situation in which a worker is stopped and remaining metadata that has not been written is lost.
Fix
flush(CrawlableUri uri)
.