Unidata / thredds

THREDDS Data Server v4.6
https://www.unidata.ucar.edu/software/tds/v4.6/index.html
265 stars 179 forks source link

FileCache startup/shutdown race condition #491

Open cwardgar opened 8 years ago

cwardgar commented 8 years ago

When ucar.nc2.util.cache.FileCache is started up and shutdown multiple times, there appears to be a synchronization issue. This is demonstrated in the sporadic failure of ucar.nc2.util.cache.ReacquireClosedDatasetSpec:

ucar.nc2.util.cache.ReacquireClosedDatasetSpec > classMethod FAILED
    java.lang.IllegalStateException: Timer already cancelled.
        at java.util.Timer.sched(Timer.java:397)
        at java.util.Timer.scheduleAtFixedRate(Timer.java:328)
        at ucar.nc2.util.cache.FileCache.<init>(FileCache.java:178)
        at ucar.nc2.dataset.NetcdfDataset.initNetcdfFileCache(NetcdfDataset.java:291)
        at ucar.nc2.util.cache.ReacquireClosedDatasetSpec.setupSpec(ReacquireClosedDatasetSpec.groovy:17)

That test calls the NetcdfDataset.initNetcdfFileCache & NetcdfDataset.shutdown pair multiple times. Luckily, we're not doing that in production, but it seems like something that should work anyway.

I attempted to fix this in 7890b4c4cd483c1d80f6d77612dab8f9687f4bb5. @JohnLCaron, any thoughts?

JohnLCaron commented 8 years ago

FileCache fixes seem ok to me.

Im guessing you dont really need to synch these NetcdfDataset methods:

static public synchronized ucar.nc2.util.cache.FileCacheIF getNetcdfFileCache() ; static public synchronized void initNetcdfFileCache(int minElementsInMemory, int maxElementsInMemory, int period);

but i dont see a harm. Note that the fact that we let cache object escape with getNetcdfFileCache() is a problem. but only used for debug/admin, not by regular users.

do you still see problems?

cwardgar commented 8 years ago

Yeah, we just saw it in Sean's ncwms2 pull request. He re-ran the build and it went away. After my commit, we've probably had a half dozen other Travis successes as well. I expect we'll see it again.

And yeah, I agree about the NetcdfDataset methods probably not needing synchronized; I was just being thorough.