Findwise / Hydra

Distributed processing framework for search solutions
http://findwise.github.io/Hydra
Other
81 stars 47 forks source link

Nullpointer in the Memory Cache in hydra 0.4.0 #298

Open ssimon opened 10 years ago

ssimon commented 10 years ago

Getting a Nullpointer in the Memory Cache in hydra 0.4.0. Don't really know where to start dubugging this..

2014-01-15 17:02:30,064 [Thread-4] ERROR com.findwise.hydra.Main - Got an uncaught exception. Shutting down Hydra
java.lang.NullPointerException: null
    at com.findwise.hydra.MemoryCache.removeStale(MemoryCache.java:183) ~[hydra-core.jar:na]
    at com.findwise.hydra.CachingDocumentNIO.flush(CachingDocumentNIO.java:372) ~[hydra-core.jar:na]
    at com.findwise.hydra.CachingDocumentNIO$CacheMonitor.run(CachingDocumentNIO.java:424) ~[hydra-core.jar:na]
2014-01-15 17:02:30,064 [Thread-4] INFO com.findwise.hydra.Main - Got shutdown request...
ssimon commented 10 years ago

Maybe it's in combination with discarding documents that this fails..?

ssimon commented 10 years ago

Or outputting rather, I didn't have a discarding stage in that pipeline.

Is the document removed from the memory cache if it was outputed already?

laserval commented 10 years ago

Hm, just some thinking without debugging: So the relevant line is https://github.com/Findwise/Hydra/blob/0.4.0/database/src/main/java/com/findwise/hydra/MemoryCache.java#L183

Entry<DocumentID<T>, Long> entry = it.next();
if (time - entry.getValue() > stalerThanMs) {
    DatabaseDocument<T> d = getDocumentById(entry.getKey());
    list.add(d);
    map.remove(d.getID());      <- there
    it.remove();
}

This is all synchronized on the MemoryCache instance. The iterator it is over the entire cache and gives the time they were last touched. It looks like the entry in the iterator either doesn't exist in the cache or has no ID. Since the key in the entry is the document ID, it's probably the case that the document is no longer in the cache.

Outputting a document marks it as processed, using this method: https://github.com/Findwise/Hydra/blob/0.4.0/database/src/main/java/com/findwise/hydra/CachingDocumentNIO.java#L121

public boolean markProcessed(DatabaseDocument<T> d, String stage) {
        DatabaseDocument<T> cached = cache.getDocumentById(d.getID());
        if (cached != null) {
                d.putAll(cached);
                cache.remove(d.getID());
        }
        if (writer.markProcessed(d, stage)) {
                return true;
        }
        return false;
}

So documents that are marked as processed should be removed from the cache. But then it shouldn't be there in the iterator for documents that are going to be flushed, anyway.

Do you have any more information about the pipeline, and if there is any condition for triggering this?