glencoesoftware / raw2ometiff

Raw format to OME-TIFF converter
GNU General Public License v2.0
46 stars 20 forks source link

Support for 32bit floating point images #104

Closed snehashis-roy closed 1 year ago

snehashis-roy commented 1 year ago

Hello. Is it possible to convert 32-bit floating point 3D tif images to pyramid OMETIFF for the purpose of using it in OMERO? It seems that bioformats2raw can convert to zarr in a folder, but raw2ometiff gets stuck.

This is my script for a 3D float32 image (size 528x456x960 (HxWxD))

./bioformats2raw -c null -h 1024 -w 1024 --max_workers 12 --no-nested -r 8 --target-min-size=512 -p "$INPUT" $TMPDIR
./raw2ometiff --compression uncompressed --debug --max-workers=12 -p $TMPDIR $OUTPUT

This is the end of debug file where raw2ometiff gets stuck,

2023-05-24 00:02:32,129 [pool-1-thread-18] DEBUG c.g.p.PyramidFromDirectoryWriter - Writing series: 0, image: 959, tileIndex: 0
2023-05-24 00:02:32,129 [pool-1-thread-18] DEBUG c.g.p.PyramidFromDirectoryWriter -     writing 73728 compressed bytes at 1262149648
2023-05-24 00:02:32,129 [pool-1-thread-18] INFO  org.perf4j.TimingLogger - start[1684900952129] time[0] tag[writeTile]

I am happy to share the image and/or the debug output if it helps. I am using 0.6.0 and 0.4.0

melissalinkert commented 1 year ago

bioformats2raw/raw2ometiff are routinely used with floating point data, so I'd be a little surprised if there is an fundamental issue with that data type.

./raw2ometiff --compression uncompressed --debug --max-workers=12 -p $TMPDIR $OUTPUT

I would not expect this command to run at all. --max-workers=12 is not a valid option, but --max_workers=12 (the same option name as bioformats2raw) should work instead. Is this the exact command that produced the debug log?

I am using 0.6.0 and 0.4.0

bioformats2raw 0.6.1 and raw2ometiff 0.4.1 are the latest versions; I don't know of a specific change that might solve this problem, but it may be worth upgrading to see if that helps.

snehashis-roy commented 1 year ago

You are right, it was indeed a typo in my script, --max-workers should have been --max_workers. However, --max-workers still works for raw2ometiff 0.4.0 and 0.4.1.

Here is my test 4D 32bit image that is not being converted by raw2ometiff (0.4.1 & 0.4.0), bioformats2raw can successfully convert it though. The image opens fine on Fiji.

https://hpc.nih.gov/~roys5/Fos_heatmap_51623_B.tif

I used the following commands,

./bioformats2raw-0.6.1/bin/bioformats2raw -c null -h 1024 -w 1024 --max_workers 12 --no-nested -r 2 --target-min-size=512 -p "Fos_heatmap_51623_B.tif" ./tmp.suFH2ojHgm
./raw2ometiff-0.4.1/bin/raw2ometiff --compression uncompressed --max_workers=12 -p ./tmp.suFH2ojHgm Fos_heatmap_51623_B.ome.tiff

The last step of raw2ometiff hangs, but a potentially corrupt OMETIFF file is generated. tiffinfo on the corrupt image shows this,

TIFFFetchDirectory: Sanity check on directory count failed, zero tag directories not supported.
TIFFReadDirectory: Failed to read directory at offset 16.
melissalinkert commented 1 year ago

Thanks for the additional details, @snehashis-roy.

Unfortunately, I so far haven't been able to reproduce this issue using Fos_heatmap_51623_B.tif; both bioformats2raw and raw2ometiff as described in https://github.com/glencoesoftware/raw2ometiff/issues/104#issuecomment-1561434066 complete without error. The runtime for each is under a minute, and the resulting OME-TIFF is valid.

If you can capture a jstack when you see raw2ometiff get stuck, that may help us to understand what's going wrong. To do that, start the raw2ometiff command as before. In a separate terminal, run:

$ jps
...
8734 PyramidFromDirectoryWriter

That prints a list of running Java processes; look for the line with PyramidFromDirectoryWriter as above and note the number (the PID). When you see raw2ometiff get stuck, run in a separate terminal:

$ jstack 8734 > raw2ometiff-jstack.txt

replacing 8734 with the PID you noted previously. Please attach the raw2ometiff-jstack.txt file to this issue.

A few other things to consider:

snehashis-roy commented 1 year ago

This is the jstack $PID output of stuck raw2ometiff

2023-05-24 18:21:54
Full thread dump OpenJDK 64-Bit Server VM (25.152-b12 mixed mode):

"Attach Listener" #27 daemon prio=9 os_prio=0 tid=0x00002aab74001000 nid=0x11984 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Java2D Disposer" #22 daemon prio=10 os_prio=0 tid=0x00002aaab1182800 nid=0x11843 in Object.wait() [0x00002aab733b3000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000005d268d0b8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
    - locked <0x00000005d268d0b8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
    at sun.java2d.Disposer.run(Disposer.java:148)
    at java.lang.Thread.run(Thread.java:745)

"ProgressBar" #21 daemon prio=5 os_prio=0 tid=0x00002aaab108c000 nid=0x11842 waiting on condition [0x00002aab5f81e000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x0000000080287dd0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081)
    at java.util.concurrent.ScheduledThreadPoolExecautor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

"Service Thread" #17 daemon prio=9 os_prio=0 tid=0x00002aaab0101000 nid=0x1183d runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C1 CompilerThread11" #16 daemon prio=9 os_prio=0 tid=0x00002aaab00fd800 nid=0x1183c waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C1 CompilerThread10" #15 daemon prio=9 os_prio=0 tid=0x00002aaab00fb800 nid=0x1183b waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C1 CompilerThread9" #14 daemon prio=9 os_prio=0 tid=0x00002aaab00f9800 nid=0x1183a waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C1 CompilerThread8" #13 daemon prio=9 os_prio=0 tid=0x00002aaab00f7000 nid=0x11839 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread7" #12 daemon prio=9 os_prio=0 tid=0x00002aaab00f5000 nid=0x11838 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread6" #11 daemon prio=9 os_prio=0 tid=0x00002aaab00f2800 nid=0x11837 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread5" #10 daemon prio=9 os_prio=0 tid=0x00002aaab00f0800 nid=0x11836 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread4" #9 daemon prio=9 os_prio=0 tid=0x00002aaab00e6000 nid=0x11835 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread3" #8 daemon prio=9 os_prio=0 tid=0x00002aaab00e4000 nid=0x11834 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread2" #7 daemon prio=9 os_prio=0 tid=0x00002aaab00e2000 nid=0x11833 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00002aaab00e0000 nid=0x11832 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00002aaab00dd000 nid=0x11831 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00002aaab00db800 nid=0x11830 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00002aaab00a8000 nid=0x1182f in Object.wait() [0x00002aab5cd0f000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x0000000080214420> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
    - locked <0x0000000080214420> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00002aaab00a3800 nid=0x1182e in Object.wait() [0x00002aab5cc0e000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x000000008021c3d8> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:502)
    at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
    - locked <0x000000008021c3d8> (a java.lang.ref.Reference$Lock)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)

"main" #1 prio=5 os_prio=0 tid=0x00002aaab000c800 nid=0x11818 runnable [0x00002aaaaabd3000]
   java.lang.Thread.State: RUNNABLE
    at java.io.RandomAccessFile.setLength(Native Method)
    at loci.common.NIOFileHandle.setLength(NIOFileHandle.java:229)
    at loci.common.AbstractNIOHandle.validateLength(AbstractNIOHandle.java:87)
    at loci.common.NIOFileHandle.writeSetup(NIOFileHandle.java:659)
    at loci.common.NIOFileHandle.writeShort(NIOFileHandle.java:605)
    at loci.common.RandomAccessOutputStream.writeShort(RandomAccessOutputStream.java:333)
    at loci.formats.tiff.TiffSaver.writeIFDValue(TiffSaver.java:580)
    at loci.formats.tiff.TiffSaver.writeIFD(TiffSaver.java:535)
    at com.glencoesoftware.pyramid.PyramidFromDirectoryWriter.writeIFD(PyramidFromDirectoryWriter.java:1226)
    at com.glencoesoftware.pyramid.PyramidFromDirectoryWriter.writeIFDs(PyramidFromDirectoryWriter.java:988)
    at com.glencoesoftware.pyramid.PyramidFromDirectoryWriter.convertToPyramid(PyramidFromDirectoryWriter.java:822)
    at com.glencoesoftware.pyramid.PyramidFromDirectoryWriter.call(PyramidFromDirectoryWriter.java:238)
    at com.glencoesoftware.pyramid.PyramidFromDirectoryWriter.call(PyramidFromDirectoryWriter.java:99)
    at picocli.CommandLine.executeUserObject(CommandLine.java:1783)
    at picocli.CommandLine.access$900(CommandLine.java:145)
    at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2150)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2144)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2108)
    at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1968)
    at picocli.CommandLine.parseWithHandlers(CommandLine.java:2349)
    at picocli.CommandLine.parseWithHandler(CommandLine.java:2284)
    at picocli.CommandLine.call(CommandLine.java:2560)
    at com.glencoesoftware.pyramid.PyramidFromDirectoryWriter.main(PyramidFromDirectoryWriter.java:213)

"VM Thread" os_prio=0 tid=0x00002aaab009b800 nid=0x1182d runnable 

"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00002aaab0021800 nid=0x11819 runnable 

"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00002aaab0023000 nid=0x1181a runnable 

"GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00002aaab0025000 nid=0x1181b runnable 

"GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00002aaab0027000 nid=0x1181c runnable 

"GC task thread#4 (ParallelGC)" os_prio=0 tid=0x00002aaab0028800 nid=0x1181d runnable 

"GC task thread#5 (ParallelGC)" os_prio=0 tid=0x00002aaab002a800 nid=0x1181e runnable 

"GC task thread#6 (ParallelGC)" os_prio=0 tid=0x00002aaab002c800 nid=0x1181f runnable 

"GC task thread#7 (ParallelGC)" os_prio=0 tid=0x00002aaab002e000 nid=0x11820 runnable 

"GC task thread#8 (ParallelGC)" os_prio=0 tid=0x00002aaab0030000 nid=0x11821 runnable 

"GC task thread#9 (ParallelGC)" os_prio=0 tid=0x00002aaab0032000 nid=0x11822 runnable 

"GC task thread#10 (ParallelGC)" os_prio=0 tid=0x00002aaab0033800 nid=0x11823 runnable 

"GC task thread#11 (ParallelGC)" os_prio=0 tid=0x00002aaab0035800 nid=0x11824 runnable 

"GC task thread#12 (ParallelGC)" os_prio=0 tid=0x00002aaab0037000 nid=0x11825 runnable 

"GC task thread#13 (ParallelGC)" os_prio=0 tid=0x00002aaab0039000 nid=0x11826 runnable 

"GC task thread#14 (ParallelGC)" os_prio=0 tid=0x00002aaab003b000 nid=0x11827 runnable 

"GC task thread#15 (ParallelGC)" os_prio=0 tid=0x00002aaab003c800 nid=0x11828 runnable 

"GC task thread#16 (ParallelGC)" os_prio=0 tid=0x00002aaab003e800 nid=0x11829 runnable 

"GC task thread#17 (ParallelGC)" os_prio=0 tid=0x00002aaab0040800 nid=0x1182a runnable 

"GC task thread#18 (ParallelGC)" os_prio=0 tid=0x00002aaab0042000 nid=0x1182b runnable 

"GC task thread#19 (ParallelGC)" os_prio=0 tid=0x00002aaab0044000 nid=0x1182c runnable 

"VM Periodic Task Thread" os_prio=0 tid=0x00002aaab0103800 nid=0x1183e waiting on condition 

JNI global references: 463

My java -version gives this,

java -version
openjdk version "1.8.0_152-release"
OpenJDK Runtime Environment (build 1.8.0_152-release-1056-b12)
OpenJDK 64-Bit Server VM (build 25.152-b12, mixed mode)
snehashis-roy commented 1 year ago

Hello. This slowdown was indeed caused by network storage. On other large 5D images (e.g. 100x100x5x5x10000 in XYCZT), I also observed that the last step is exponentially faster on a local spinning SATA HDDs (~30sec) compared to a network storage (>2hours). The network SAS storage in on an HPC, connected by 10Gbps Mellanox card, where I usually get 9.2Gbps reading/writing files. So the bandwidth is probably not the issue. I am wondering why the last step has such a wide speed speed discrepancy based on storage location. Thank you.

melissalinkert commented 1 year ago

Thanks for the update, @snehashis-roy. Ultimately I think this is down to repeatedly updating the file length as new data is written; we know this can be expensive on network storage, and the final metadata writing step in raw2ometiff doesn't do a great job of minimizing the number of writes to disk. https://github.com/ome/bioformats/issues/3983 documents similar issues.

https://github.com/ome/ome-common-java/issues/78 is likely the long-term solution we'll pursue here.