IDR / idr-metadata

Curated metadata for all studies published in the Image Data Resource
https://idr.openmicroscopy.org
14 stars 24 forks source link

NGFF perf testing #687

Open will-moore opened 8 months ago

will-moore commented 8 months ago

Compare formats (on disk)

To compare the performance of NGFF data (ZarrReader) with other formats (both on disk), we want to compare NGFF version of the data alongside the same data in it's original format on the same server.

Choose some data to work with: idr0003 is not too big at 2.3G for a plate. Summary: (more details below):

Screenshot 2024-03-05 at 11 14 59

Conclusion: NGFF is no slower (maybe faster)?

Compare disk vv s3

We want to test the performance of loading data from s3 compared with loading the same data from local disk. Use idr0010 data since all plates are identical in terms of size etc:

Screenshot 2024-03-05 at 11 43 10

Conclusion: Data access via S3 is slower than on disk:

will-moore commented 8 months ago
$ ssh pilot-zarr1-dev

screen -r idr0001
cd /data/idr0003
conda activate bioformats2raw
~/bioformats2raw-0.7.0/bin/bioformats2raw --memo-directory ../memo /uod/idr/filesets/idr0003-breker-plasticity/201301120/Images/DTT/p1/experiment_descriptor.xml p1.ome.zarr

OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/opencv_openpnp3176581939484032263/nu/pattern/opencv/linux/x86_64/libopencv_java342.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.esotericsoftware.reflectasm.AccessClassLoader (file:/home/wmoore/bioformats2raw-0.7.0/lib/reflectasm-1.11.9.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of com.esotericsoftware.reflectasm.AccessClassLoader
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release

Looks like an error, but seems to have worked OK..

(bioformats2raw) [wmoore@pilot-zarr1-dev idr0003]$ find ./ -name .zattrs
...
./p1.ome.zarr/P/24/2/.zattrs
./p1.ome.zarr/P/24/.zattrs
./p1.ome.zarr/.zattrs
(bioformats2raw) [wmoore@pilot-zarr1-dev idr0003]$ find ./ -name .zattrs | wc
   1538    1538   43248

$ zip -r p1.ome.zarr.zip p1.ome.zarr

Download (1.4 G) and upload to idr-testing...

$ rsync -rvP pilot-zarr1-dev:/data/idr0003/p1.ome.zarr.zip ./
$ rsync -rvP p1.ome.zarr.zip idr-testing.openmicroscopy.org:/home/wmoore/
$ ssh -A idr-testing.openmicroscopy.org
$ rsync -rvP p1.ome.zarr.zip omeroreadwrite:/home/wmoore/

Import..

$ ssh omeroreadwrite
$ unzip p1.ome.zarr

(venv3) [wmoore@test120-omeroreadwrite ~]$ omero import --depth 20 p1.ome.zarr

2024-03-01 11:55:41,047 889        [      main] INFO          ome.formats.importer.ImportConfig - OMERO.blitz Version: 5.7.2
2024-03-01 11:55:41,070 912        [      main] INFO          ome.formats.importer.ImportConfig - Bioformats version: 7.1.0 revision: 05c7b2413cfad19a73b619c61ddf77ca2d038ce7 date: 11 December 2023
2024-03-01 11:55:41,391 1233       [      main] INFO   formats.importer.cli.CommandLineImporter - Log levels -- Bio-Formats: ERROR OMERO.importer: INFO
2024-03-01 11:55:42,125 1967       [      main] INFO      ome.formats.importer.ImportCandidates - Depth: 20 Metadata Level: MINIMUM
2024-03-01 11:55:58,163 18005      [      main] INFO      ome.formats.importer.ImportCandidates - 16917 file(s) parsed into 1 group(s) with 1 call(s) to setId in 11022ms. (16037ms total) [0 unknowns]
2024-03-01 11:55:59,202 19044      [      main] INFO       ome.formats.OMEROMetadataStoreClient - Attempting initial SSL connection to localhost:4064
2024-03-01 11:56:01,233 21075      [      main] INFO       ome.formats.OMEROMetadataStoreClient - Insecure connection requested, falling back
2024-03-01 11:56:02,035 21877      [      main] INFO       ome.formats.OMEROMetadataStoreClient - Pinging session every 300s.
2024-03-01 11:56:02,055 21897      [      main] INFO       ome.formats.OMEROMetadataStoreClient - Server: 5.6.10
2024-03-01 11:56:02,055 21897      [      main] INFO       ome.formats.OMEROMetadataStoreClient - Client: 5.7.2
2024-03-01 11:56:02,055 21897      [      main] INFO       ome.formats.OMEROMetadataStoreClient - Java Version: 1.8.0_402
2024-03-01 11:56:02,055 21897      [      main] INFO       ome.formats.OMEROMetadataStoreClient - OS Name: Linux
2024-03-01 11:56:02,055 21897      [      main] INFO       ome.formats.OMEROMetadataStoreClient - OS Arch: amd64
2024-03-01 11:56:02,055 21897      [      main] INFO       ome.formats.OMEROMetadataStoreClient - OS Version: 3.10.0-1160.108.1.el7.x86_64
2024-03-01 11:56:02,604 22446      [2-thread-1] INFO   ormats.importer.cli.LoggingImportMonitor - FILESET_UPLOAD_PREPARATION
...
2024-03-02 01:38:53,782 49393624   [3-thread-1] INFO   ormats.importer.cli.LoggingImportMonitor - FILE_UPLOAD_COMPLETE: /home/wmoore/p1.ome.zarr/.zattrs
2024-03-02 02:01:42,222 50762064   [2-thread-1] INFO   ormats.importer.cli.LoggingImportMonitor - FILESET_UPLOAD_END
2024-03-02 02:01:43,318 50763160   [2-thread-1] INFO   ormats.importer.cli.LoggingImportMonitor - IMPORT_STARTED Logfile: 64420961
2024-03-02 02:05:00,964 50960806   [l.Client-0] INFO   ormats.importer.cli.LoggingImportMonitor - METADATA_IMPORTED Step: 1 of 5  Logfile: 64420961
2024-03-02 02:08:22,203 51162045   [l.Client-4] INFO   ormats.importer.cli.LoggingImportMonitor - PIXELDATA_PROCESSED Step: 2 of 5  Logfile: 64420961
2024-03-02 02:12:29,399 51409241   [l.Client-5] INFO   ormats.importer.cli.LoggingImportMonitor - THUMBNAILS_GENERATED Step: 3 of 5  Logfile: 64420961
2024-03-02 02:12:29,689 51409531   [l.Client-6] INFO   ormats.importer.cli.LoggingImportMonitor - METADATA_PROCESSED Step: 4 of 5  Logfile: 64420961
2024-03-02 02:12:29,769 51409611   [l.Client-5] INFO   ormats.importer.cli.LoggingImportMonitor - OBJECTS_RETURNED Step: 5 of 5  Logfile: 64420961
2024-03-02 02:12:31,373 51411215   [l.Client-6] INFO   ormats.importer.cli.LoggingImportMonitor - IMPORT_DONE Imported file: /home/wmoore/p1.ome.zarr/OME/METADATA.ome.xml
Plate:10551
Other imported objects:
Fileset:6317541

==> Summary
16917 files uploaded, 1 fileset, 1 plate created, 1152 images imported, 0 errors in 14:16:28.893

Wow - took 14 hours to import!

will-moore commented 8 months ago

After IDR meeting today, 5 of us spend 20 minutes opening many images from that plate without seeing errors and showing good/acceptable performance. cc @francesw @jburel

will-moore commented 8 months ago

Downloaded 3 plates.zip from https://www.ebi.ac.uk/biostudies/submissions/files?path=%2Fuser%2Fidr0010 Uploaded to idr-testing:omeroreadwrite, placed in new dir at /data/ngff, unzipped and owned by omero-server

$ pwd
/data/ngff
$ ls -lh
total 1.4G
drwxrwxr-x. 15 omero-server omero-server  219 Jul 10  2023 101-24.ome.zarr
-rw-r--r--.  1 omero-server wmoore       457M Mar  4 12:22 101-24.ome.zarr.zip
drwxrwxr-x. 15 omero-server omero-server  219 Jul 10  2023 10-34.ome.zarr
-rw-r--r--.  1 omero-server wmoore       455M Mar  4 12:21 10-34.ome.zarr.zip
drwxrwxr-x. 15 omero-server omero-server  219 Jul 10  2023 103.ome.zarr
-rw-r--r--.  1 omero-server wmoore       461M Mar  4 12:22 103.ome.zarr.zip

For plate 10-34, find location in ManagedRepo from webclient... Can see symlink to s3:

bash-4.2$ ls -lh /data/OMERO/ManagedRepository/demo_2/2016-05/21/00-27-54.591_mkngff/
total 4.0K
lrwxrwxrwx. 1 omero-server omero-server 109 Dec  6 11:35 2726d2ef-2f45-45b6-9d73-68ea1d57c1b6.zarr -> /bia-integrator-data/S-BIAD885/2726d2ef-2f45-45b6-9d73-68ea1d57c1b6/2726d2ef-2f45-45b6-9d73-68ea1d57c1b6.zarr
-rw-r--r--. 1 omero-server omero-server  49 Dec  6 11:35 2726d2ef-2f45-45b6-9d73-68ea1d57c1b6.zarr.bfoptions

Update symlink (as omero-server):

rm /data/OMERO/ManagedRepository/demo_2/2016-05/21/00-27-54.591_mkngff/2726d2ef-2f45-45b6-9d73-68ea1d57c1b6.zarr
ln -s /data/ngff/10-34.ome.zarr /data/OMERO/ManagedRepository/demo_2/2016-05/21/00-27-54.591_mkngff/2726d2ef-2f45-45b6-9d73-68ea1d57c1b6.zarr

Looks good:

$ ls -lh /data/OMERO/ManagedRepository/demo_2/2016-05/21/00-27-54.591_mkngff/
total 4.0K
lrwxrwxrwx. 1 omero-server omero-server 25 Mar  4 12:40 2726d2ef-2f45-45b6-9d73-68ea1d57c1b6.zarr -> /data/ngff/10-34.ome.zarr
-rw-r--r--. 1 omero-server omero-server 49 Dec  6 11:35 2726d2ef-2f45-45b6-9d73-68ea1d57c1b6.zarr.bfoptions

$ ls /data/OMERO/ManagedRepository/demo_2/2016-05/21/00-27-54.591_mkngff/2726d2ef-2f45-45b6-9d73-68ea1d57c1b6.zarr/
A  B  C  D  E  F  G  H   I  J  K  L  OME
will-moore commented 8 months ago

Repeating for the other 2 plates downloaded above...

Plate 101-24:

bash-4.2$ rm /data/OMERO/ManagedRepository/demo_2/2016-05/21/02-06-31.113_mkngff/49150a5d-8fc2-499a-bbc6-4a3eed2d44b1.zarr
bash-4.2$ ln -s /data/ngff/101-24.ome.zarr /data/OMERO/ManagedRepository/demo_2/2016-05/21/02-06-31.113_mkngff/49150a5d-8fc2-499a-bbc6-4a3eed2d44b1.zarr
bash-4.2$ ls /data/OMERO/ManagedRepository/demo_2/2016-05/21/02-06-31.113_mkngff/49150a5d-8fc2-499a-bbc6-4a3eed2d44b1.zarr
A  B  C  D  E  F  G  H  I  J  K  L  OME

Plate 103:

bash-4.2$ rm /data/OMERO/ManagedRepository/demo_2/2016-05/21/02-26-08.432_mkngff/1fab1705-9561-4689-891d-e039c4ec3076.zarr
bash-4.2$ ln -s /data/ngff/103.ome.zarr /data/OMERO/ManagedRepository/demo_2/2016-05/21/02-26-08.432_mkngff/1fab1705-9561-4689-891d-e039c4ec3076.zarr
bash-4.2$ ls /data/OMERO/ManagedRepository/demo_2/2016-05/21/02-26-08.432_mkngff/1fab1705-9561-4689-891d-e039c4ec3076.zarr
A  B  C  D  E  F  G  H  I  J  K  L  OME
will-moore commented 8 months ago

Since /data/ngff isn't accessible on omeroreadonly servers, we need a different location, and copy the data to all servers...

E.g.

for server in omeroreadonly-1 omeroreadonly-2 omeroreadonly-3 omeroreadonly-4; do rsync -rvP 101-24.ome.zarr.zip $server:/home/wmoore ; done;

ssh omeroreadonly-1

for z in 101-24.ome.zarr.zip  10-34.ome.zarr.zip  103.ome.zarr.zip; do sudo chown omero-server $z; done
sudo mkdir /ngff && sudo chown -R omero-server /ngff
for z in 101-24.ome.zarr.zip  10-34.ome.zarr.zip  103.ome.zarr.zip; do sudo mv $z /ngff; done
sudo -u omero-server -s
cd /ngff/
for z in 101-24.ome.zarr.zip  10-34.ome.zarr.zip  103.ome.zarr.zip; do unzip $z; done

On omeroreadwrite, move data to /ngff and update symlinks...

bash-4.2$ rm /data/OMERO/ManagedRepository/demo_2/2016-05/21/00-27-54.591_mkngff/2726d2ef-2f45-45b6-9d73-68ea1d57c1b6.zarr
bash-4.2$ ln -s /ngff/10-34.ome.zarr /data/OMERO/ManagedRepository/demo_2/2016-05/21/00-27-54.591_mkngff/2726d2ef-2f45-45b6-9d73-68ea1d57c1b6.zarr

bash-4.2$ rm /data/OMERO/ManagedRepository/demo_2/2016-05/21/02-06-31.113_mkngff/49150a5d-8fc2-499a-bbc6-4a3eed2d44b1.zarr
bash-4.2$ ln -s /ngff/101-24.ome.zarr /data/OMERO/ManagedRepository/demo_2/2016-05/21/02-06-31.113_mkngff/49150a5d-8fc2-499a-bbc6-4a3eed2d44b1.zarr

bash-4.2$ rm /data/OMERO/ManagedRepository/demo_2/2016-05/21/02-26-08.432_mkngff/1fab1705-9561-4689-891d-e039c4ec3076.zarr
bash-4.2$ ln -s /ngff/103.ome.zarr /data/OMERO/ManagedRepository/demo_2/2016-05/21/02-26-08.432_mkngff/1fab1705-9561-4689-891d-e039c4ec3076.zarr

Looks good - images are viewable under idr-testing.openmicroscopy.org