CSBDeep / CSBDeep_website

40 stars 3 forks source link

Negative Array Size exception (Fiji/Windows x64) #6

Open msymeonides opened 6 years ago

msymeonides commented 6 years ago

I am trying to run 3D Denoising (Tribolium) in Fiji on Windows 10 x64. It works fine (actually works beautifully) on a small dataset (500 x 900 x 45, 16-bit, 39 MB), but fails on a much larger one (12900 x 2048 x 116, 16-bit, 5.7 GB) with the following error:

[Thu Jun 07 11:12:16 EDT 2018] [ERROR] [] Module threw exception
java.lang.NegativeArraySizeException
    at mpicbg.csbd.normalize.PercentileNormalizer.percentiles(PercentileNormalizer.java:113)
    at mpicbg.csbd.normalize.PercentileNormalizer.prepareNormalization(PercentileNormalizer.java:99)
    at mpicbg.csbd.commands.CSBDeepCommand.normalizeInput(CSBDeepCommand.java:303)
    at mpicbg.csbd.commands.CSBDeepCommand.runInternal(CSBDeepCommand.java:257)
    at mpicbg.csbd.commands.CSBDeepCommand.run(CSBDeepCommand.java:241)
    at mpicbg.csbd.commands.NetTribolium.run(NetTribolium.java:100)
    at org.scijava.command.CommandModule.run(CommandModule.java:199)
    at org.scijava.module.ModuleRunner.run(ModuleRunner.java:168)
    at org.scijava.module.ModuleRunner.call(ModuleRunner.java:127)
    at org.scijava.module.ModuleRunner.call(ModuleRunner.java:66)
    at org.scijava.thread.DefaultThreadService$3.call(DefaultThreadService.java:238)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

I have tried tweaking the number of tiles, anywhere from 1 up to 128, but always get the same exception. The overlap is set to 32. I am running Java 1.8.0_161 and Fiji is updated (1.52c). The PC is a dual Xeon with 128 GB RAM and the data is hosted on an SSD RAID.

Can't think of what is different about the data other than the size. It was acquired with the same instrument and almost identical settings (only real difference is the voxel depth in the working dataset is 1.5 um while in the failing dataset it is 2 um), and the SNR is pretty similar in both datasets, that is, they are both very noisy, I'm just looking at nuclei (15 or so in the working dataset, several thousand in the failing one).

I cropped the failing dataset down to 1000 x 1000 x 45, 86 MB, and it works...

fjug commented 6 years ago

Thanks for your report, we will look into that ASAP.

msymeonides commented 6 years ago

Possibly related issue, I just cropped that bigger dataset down to 6000 x 2000 x 116, 2.6 GB, and now I'm getting a different exception:

[Thu Jun 07 12:49:27 EDT 2018] [ERROR] [] Module threw exception
java.lang.ArrayIndexOutOfBoundsException: -1055490525
    at net.imglib2.util.Util.quicksort(Util.java:388)
    at net.imglib2.util.Util.quicksort(Util.java:408)
    at net.imglib2.util.Util.quicksort(Util.java:408)
    at net.imglib2.util.Util.quicksort(Util.java:406)
    at net.imglib2.util.Util.quicksort(Util.java:408)
    at net.imglib2.util.Util.quicksort(Util.java:406)
    at net.imglib2.util.Util.quicksort(Util.java:406)
    at net.imglib2.util.Util.quicksort(Util.java:408)
    at net.imglib2.util.Util.quicksort(Util.java:380)
    at mpicbg.csbd.normalize.PercentileNormalizer.percentiles(PercentileNormalizer.java:121)
    at mpicbg.csbd.normalize.PercentileNormalizer.prepareNormalization(PercentileNormalizer.java:99)
    at mpicbg.csbd.commands.CSBDeepCommand.normalizeInput(CSBDeepCommand.java:303)
    at mpicbg.csbd.commands.CSBDeepCommand.runInternal(CSBDeepCommand.java:257)
    at mpicbg.csbd.commands.CSBDeepCommand.run(CSBDeepCommand.java:241)
    at mpicbg.csbd.commands.NetTribolium.run(NetTribolium.java:100)
    at org.scijava.command.CommandModule.run(CommandModule.java:199)
    at org.scijava.module.ModuleRunner.run(ModuleRunner.java:168)
    at org.scijava.module.ModuleRunner.call(ModuleRunner.java:127)
    at org.scijava.module.ModuleRunner.call(ModuleRunner.java:66)
    at org.scijava.thread.DefaultThreadService$3.call(DefaultThreadService.java:238)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

My guess is it is the normalizer that can't handle the bigger file sizes. I got the same result with tiles set to 4, 8, 16, 32, or 64 (overlap always set to 32). My guess is the normalization is not also done in tiles?

A 9000 x 2000 x 45, 1.5 GB file works.

A 12000 x 2000 x 45, 2 GB file throws the same ArrayIndexOutOfBoundsException. So I think it's just size, somewhere in between 1.5 GB and 2 GB the normalizer fails, and for even bigger files I think it doesn't even make it that far and throws NegativeArraySizeException.

Sorry if all the different datasets are confusing, just trying to figure out what is breaking it.

royerloic commented 6 years ago

You are hitting the 32bit limit of arrays and buffers in Java. Might be nearly impossible to fix unless with significant engineering. This is not something that imglib2 can fix, because that’s likely caused by the binding to Tensorflow which is using NIO buffers…

On 7. Jun 2018, at 10:15, Mel Symeonides notifications@github.com wrote:

Possibly related issue, I just cropped that bigger dataset down to 6000 x 2000 x 116, 2.6 GB, and now I'm getting a different exception:

[Thu Jun 07 12:49:27 EDT 2018] [ERROR] [] Module threw exception java.lang.ArrayIndexOutOfBoundsException: -1055490525 at net.imglib2.util.Util.quicksort(Util.java:388) at net.imglib2.util.Util.quicksort(Util.java:408) at net.imglib2.util.Util.quicksort(Util.java:408) at net.imglib2.util.Util.quicksort(Util.java:406) at net.imglib2.util.Util.quicksort(Util.java:408) at net.imglib2.util.Util.quicksort(Util.java:406) at net.imglib2.util.Util.quicksort(Util.java:406) at net.imglib2.util.Util.quicksort(Util.java:408) at net.imglib2.util.Util.quicksort(Util.java:380) at mpicbg.csbd.normalize.PercentileNormalizer.percentiles(PercentileNormalizer.java:121) at mpicbg.csbd.normalize.PercentileNormalizer.prepareNormalization(PercentileNormalizer.java:99) at mpicbg.csbd.commands.CSBDeepCommand.normalizeInput(CSBDeepCommand.java:303) at mpicbg.csbd.commands.CSBDeepCommand.runInternal(CSBDeepCommand.java:257) at mpicbg.csbd.commands.CSBDeepCommand.run(CSBDeepCommand.java:241) at mpicbg.csbd.commands.NetTribolium.run(NetTribolium.java:100) at org.scijava.command.CommandModule.run(CommandModule.java:199) at org.scijava.module.ModuleRunner.run(ModuleRunner.java:168) at org.scijava.module.ModuleRunner.call(ModuleRunner.java:127) at org.scijava.module.ModuleRunner.call(ModuleRunner.java:66) at org.scijava.thread.DefaultThreadService$3.call(DefaultThreadService.java:238) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) My guess is it is the normalizer that can't handle the bigger file sizes. I got the same result with tiles set to 4, 8, 16, 32, or 64 (overlap always set to 32). My guess is the normalization is not also done in tiles?

A 9000 x 2000 x 45, 1.5 GB file works.

A 12000 x 2000 x 45, 2 GB file throws the same ArrayIndexOutOfBoundsException. So I think it's just size, somewhere in between 1.5 GB and 2 GB the normalizer fails, and for even bigger files I think it doesn't even make it that far and throws NegativeArraySizeException.

Sorry if all the different datasets are confusing, just trying to figure out what is breaking it.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CSBDeep/CSBDeep_website/issues/6#issuecomment-395497623, or mute the thread https://github.com/notifications/unsubscribe-auth/AByMkthJvm57d3zVZ13yQKbhxAbtkwpqks5t6V-XgaJpZM4Uen2Z.

HedgehogCode commented 6 years ago

The problem is that the percentiles method in PercentileNormalizer.java writes all the values of the image into a Java array. It can't do it tiled because it has to consider the whole image to compute the percentile.

The binding to TensorFlow isn't a problem because we tile the image before putting it into an NIO buffer.

msymeonides commented 6 years ago

So I gather these are Java-specific issues and if I run CSBDeep in Python there should be no such issues, correct? Also if I use Python I can apply models on files that also have a T dimension (timelapses), which I cannot do in Fiji, right?

frauzufall commented 6 years ago

@uschmidt83 can maybe tell you if there are similar limitations in Python. I will see if there is a way to improve the normalization. It would also be nice to maybe make percentile normalization an ImageJ Op.. and / or an imglib2 algorithm? @tpietzsch (related code)

tpietzsch commented 6 years ago

The image size 12900 x 2048 x 116 > 2^31 overflows int. Therefore the NegativeArraySizeException exception in https://github.com/CSBDeep/CSBDeep_fiji/blob/cc83fa343e34506352fee0691c76fe22adab3131/src/main/java/mpicbg/csbd/normalize/PercentileNormalizer.java#L113

With 6000 x 2000 x 116 it fits into int but is still > 2^30 and this index computation overflows: https://github.com/imglib/imglib2/blob/617c5e515c4ce3285ce3bd3d29b25b8e51ed5ece/src/main/java/net/imglib2/util/Util.java#L388

@HedgehogCode you could rewrite it and store the values in a 1D CellImg. You will still need a quicksort variant that uses long indices and works on 1D CellImg (both should be easy to do.)

I would reconsider though, whether you really want to sort a 5.7 GB array. Maybe you can approximate with a histogram?

uschmidt83 commented 6 years ago

So I gather these are Java-specific issues and if I run CSBDeep in Python there should be no such issues, correct?

I just tried it with the Python code and it does work (on my Linux workstation with 64 GB of RAM).

However, you need to use a currently undocumented (and likely to change) feature. Instead of providing an integer for n_tiles, you can choose a tuple, such as (32,16). In this example, the largest image dimension would be split into 32 tiles, and the second largest into 16.

from csbdeep.models import CARE

# load trained model
model = CARE(config=None, name='my_model', basedir='models')

# placeholder image of same size (replace with actual image)
x = np.ones((116,2048,12900),np.uint16)

# this took around 15min with a Titan X GPU
restored = model.predict(x, 'ZYX', n_tiles=(32,16))
uschmidt83 commented 6 years ago

Also if I use Python I can apply models on files that also have a T dimension (timelapses), which I cannot do in Fiji, right?

What do you want to do exactly?

msymeonides commented 6 years ago

What do you want to do exactly?

I have 4D data (I guess 5D), i.e. timelapse of Z stacks in three channels, right now I'm doing that in Fiji using a macro that splits up the data into individual Z stacks (per channel/timepoint), denoises each, and re-combines them into a hyperstack at the end.

uschmidt83 commented 6 years ago

I have 4D data (I guess 5D), i.e. timelapse of Z stacks in three channels, right now I'm doing that in Fiji using a macro that splits up the data into individual Z stacks (per channel/timepoint), denoises each, and re-combines them into a hyperstack at the end.

You would have to do something similar in Python, i.e. split up the data to a format that the trained model expects, predict, and re-combine the results.

However, we could add functionality to make these things easier. At the moment, we provide a command line script (see demo) to apply a trained CARE model to many tiff images. However, this script expects that each tiff image is already in a format that the model can work with and doesn't have to be broken up further.