IDR / omero-mkngff

Plugin to swap OMERO filesets with NGFF
GNU General Public License v2.0
0 stars 2 forks source link

Ignore chunks #5

Closed will-moore closed 11 months ago

will-moore commented 12 months ago

As discussed this morning, we want to try ignoring chunks to reduce the number of OriginalFiles being created (e.g. see https://github.com/joshmoore/omero-mkngff/pull/4#issuecomment-1684178540)

Reverted 9a001a3b0968d35bfbf84040b37583ff2cab3c52

will-moore commented 12 months ago

Testing on idr0138-pilot...

sudo -u omero-server -s
conda activate mkngff
pip uninstall omero-mkngff
pip install 'omero-mkngff @ git+https://github.com/will-moore/omero-mkngff@ignore_chunks'

$ omero mkngff sql --secret=$SECRET 5811533 --symlink_repo /data/OMERO/ManagedRepository "/idr0054/zarr/Tonsil 2.ome.zarr/" > idr0054_2.sql
Found prefix demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15 // 15-28-44.081_converted for fileset 5811533
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted
Creating dir at /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr
Creating symlink /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr -> /idr0054/zarr/Tonsil 2.ome.zarr/

$ psql -U omero -d idr -h 192.168.10.102 -f idr0054_2.sql 
BEGIN
psql:idr0054_2.sql:29: ERROR:  null value in column "permissions" violates not-null constraint
DETAIL:  Failing row contains (5287368, null, demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_..., null, 326588607, null, null, null, 326588607).
CONTEXT:  SQL statement "insert into fileset
        (id, templateprefix, permissions, creation_id, group_id, owner_id, update_id)
        values
        (nextval('seq_fileset'), prefix, old_perms, new_event, old_group, old_owner, new_event)
        returning id"
PL/pgSQL function mkngff_fileset(bigint,character varying,character varying,character varying,text[]) line 21 at SQL statement
ROLLBACK
$ cat idr0054_2.sql 

begin;
    select mkngff_fileset(
      5811533,
      '4b358149-af39-49f0-882d-10884fab7133',
      'cdf35825-def1-4580-8d0b-9c349b8f78d6',
      'demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/',
      array[
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/', '.zattrs', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/', '.zgroup', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/', '0', 'Directory'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/0/', '.zattrs', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/0/', '.zgroup', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/0/', '0', 'Directory'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/0/0/', '.zarray', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/0/', '1', 'Directory'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/0/1/', '.zarray', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/0/', '2', 'Directory'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/0/2/', '.zarray', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/0/', '3', 'Directory'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/0/3/', '.zarray', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/0/', '4', 'Directory'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/0/4/', '.zarray', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/', 'OME', 'Directory'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/OME/', '.zattrs', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/OME/', '.zgroup', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/OME/', 'METADATA.ome.xml', 'application/octet-stream']
      ]::text[][]
    );
commit;
will-moore commented 12 months ago

@joshmoore I don't think the error I got above is due to the removal of chunks rows, but I'm not sure why I'm seeing that or what to do about it. Maybe it is something to do with trying to run the sql on an image that we've previously run this script on?

I can try on a plate where I've not previously run this script....

will-moore commented 12 months ago

Trying on a fresh plate from idr0012... (see https://github.com/joshmoore/omero-mkngff/pull/4#issuecomment-1682112697)

TLDR: got the same error:

idr0012 plate HT02...

psql -U omero -d idr -h 192.168.10.231 -c "select fileset from Image where id= 14058769"
 fileset 
---------
 5808583
(1 row)

$ omero mkngff sql --secret=$SECRET --symlink_repo=/data/OMERO/ManagedRepository 5808583 "/idr0012/ngff/HT02.ome.zarr/" > idr0012_HT02.sql

$ cat idr0012_HT02.sql | wc
   8450   25325 1287987
$ cat idr0012_HT01.sql | wc
  45410  136205 6891315

$ psql -U omero -d idr -h 192.168.10.102 -f idr0012_HT02.sql 
BEGIN
psql:idr0012_HT02.sql:8448: ERROR:  null value in column "permissions" violates not-null constraint
DETAIL:  Failing row contains (5287378, null, demo_2/Blitz-0-Ice.ThreadPool.Server-9/2023-05/03/12-52-39.994_c..., null, 326589020, null, null, null, 326589020).
CONTEXT:  SQL statement "insert into fileset
        (id, templateprefix, permissions, creation_id, group_id, owner_id, update_id)
        values
        (nextval('seq_fileset'), prefix, old_perms, new_event, old_group, old_owner, new_event)
        returning id"
PL/pgSQL function mkngff_fileset(bigint,character varying,character varying,character varying,text[]) line 21 at SQL statement
ROLLBACK
joshmoore commented 12 months ago

That's really odd. Can you share the SQL file with me? I don't know why one row would suddenly not have permissions set.

will-moore commented 12 months ago

@joshmoore The sql above cat idr0054_2.sql gave this error. But it's not coming from the change in this PR as I'm seeing the same error with the previous symlink branch. Just can't work out what I'm doing differently...

will-moore commented 12 months ago

Importing fresh idr0054 images into idr0138-pilot to use for testing mkngff. We need to import NGFF images, since we can't import original pattern file versions of these...

omero import -d 17351 --transfer=ln_s --skip=all --depth=100 /idr0054/zarr/Tonsil\ 1.ome.zarr/ --file /tmp/idr0054_1.log  --errs /tmp/idr0054_1.err
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/omero/server/OMERO.server-5.6.6-ice36/lib/client/OMEZarrReader.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/omero/server/OMERO.server-5.6.6-ice36/lib/client/logback-classic.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]
2023-08-22 10:18:22,351 276        [      main] INFO          ome.formats.importer.ImportConfig - OMERO.blitz Version: 5.6.0
2023-08-22 10:18:22,368 293        [      main] INFO          ome.formats.importer.ImportConfig - Bioformats version: 0.3.2-SNAPSHOT revision: ef64d3da4cc1dd41adad20a8f0ee936383e578ec date: 20230816-0010
2023-08-22 10:18:22,434 359        [      main] INFO   formats.importer.cli.CommandLineImporter - Setting checksum algorithm to File-Size-64
2023-08-22 10:18:22,435 360        [      main] INFO   formats.importer.cli.CommandLineImporter - Skipping thumbnails creation
2023-08-22 10:18:22,435 360        [      main] INFO   formats.importer.cli.CommandLineImporter - Skipping minimum/maximum computation
2023-08-22 10:18:22,435 360        [      main] INFO   formats.importer.cli.CommandLineImporter - Disabling upgrade check
2023-08-22 10:18:22,435 360        [      main] INFO   formats.importer.cli.CommandLineImporter - Setting transfer to ln_s
2023-08-22 10:18:22,438 363        [      main] INFO   formats.importer.cli.CommandLineImporter - Log levels -- Bio-Formats: ERROR OMERO.importer: INFO
2023-08-22 10:18:22,817 742        [      main] INFO      ome.formats.importer.ImportCandidates - Depth: 100 Metadata Level: MINIMUM
2023-08-22 10:23:26,023 303948     [      main] INFO      ome.formats.importer.ImportCandidates - 444 file(s) parsed into 0 group(s) with 433 call(s) to setId in 301561ms. (303206ms total) [0 unknowns]
2023-08-22 10:23:26,069 303994     [      main] INFO       ome.formats.OMEROMetadataStoreClient - Attempting initial SSL connection to localhost:4064
2023-08-22 10:23:26,525 304450     [      main] INFO       ome.formats.OMEROMetadataStoreClient - Insecure connection requested, falling back
2023-08-22 10:23:26,875 304800     [      main] INFO       ome.formats.OMEROMetadataStoreClient - Pinging session every 300s.
2023-08-22 10:23:26,885 304810     [      main] INFO       ome.formats.OMEROMetadataStoreClient - Server: 5.6.6
2023-08-22 10:23:26,885 304810     [      main] INFO       ome.formats.OMEROMetadataStoreClient - Client: 5.6.0
2023-08-22 10:23:26,885 304810     [      main] INFO       ome.formats.OMEROMetadataStoreClient - Java Version: 11.0.15
2023-08-22 10:23:26,885 304810     [      main] INFO       ome.formats.OMEROMetadataStoreClient - OS Name: Linux
2023-08-22 10:23:26,885 304810     [      main] INFO       ome.formats.OMEROMetadataStoreClient - OS Arch: amd64
2023-08-22 10:23:26,885 304810     [      main] INFO       ome.formats.OMEROMetadataStoreClient - OS Version: 3.10.0-1160.66.1.el7.x86_64
No imports found
will-moore commented 12 months ago

Repeated attempt to run mkngff for idr0012 HT02 without this PR... This time it fails on SECRET, even though I verified that the key in the sql is correct...

$ omero mkngff sql --secret=$SECRET 5808583 "/idr0012/ngff/HT02.ome.zarr/" > idr0012_HT02.sql
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Found prefix demo_2/Blitz-0-Ice.ThreadPool.Server-9/2023-05/03 // 12-52-39.994 for fileset 5808583
$ psql -U omero -d idr -h 192.168.10.231 -f setup.sql
CREATE FUNCTION
$ psql -U omero -d idr -h 192.168.10.231 -f idr0012_HT02.sql 
BEGIN
psql:idr0012_HT02.sql:45408: ERROR:  cannot set original repo property without secret key
CONTEXT:  PL/pgSQL function _protect_originalfile_repo_insert() line 28 at RAISE
SQL statement "insert into originalfile
          (id, permissions, creation_id, group_id, owner_id, update_id, mimetype, repo, path, name)
          values (nextval('seq_originalfile'), old_perms, new_event, old_group, old_owner, new_event,
            info[i][3], repo, info[i][1], uuid || info[i][2])
          returning id"
PL/pgSQL function mkngff_fileset(bigint,character varying,character varying,character varying,text[]) line 42 at SQL statement
ROLLBACK
(mkngff) bash-4.2$ cat idr0012_HT02.sql | grep 4b358149-af39-49f0-882d-10884fab7133
      '4b358149-af39-49f0-882d-10884fab7133',
(mkngff) bash-4.2$ psql -U omero -d idr -h 192.168.10.102 -c "select uuid from (select * from session where node = 0 and owner = 0 and defaulteventtype = 'Sessions' order by id desc limit 1) x order by x.id asc limit 1;"
                 uuid                 
--------------------------------------
 4b358149-af39-49f0-882d-10884fab7133
(1 row)
will-moore commented 12 months ago

@joshmoore I seem to have hit various blockers on me being able to run mkngff sql scripts at-all this week, due to wrong $SECRET or due to null value in column "permissions" error.

I've updated my current workflow at https://github.com/joshmoore/omero-mkngff/issues/2 (that also includes other steps to prep the Fileset IDs etc), so maybe you could review that and/or try it and see if you can work out what's not working for me?

will-moore commented 12 months ago

Testing on idr0125-pilot as omero_server user. Started from scratch, installed conda etc..

Testing with idr0051 image http://localhost:1040/webclient/?show=image-4007821 https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/pages/S-BIAD815/b2633930-86b0-489e-a845-d2a7afe6ff15.html

Installed main branch of mkngff - everything works (but sql command took a long time!).

$ omero mkngff sql --secret=$SECRET 604309 /bia-integrator-data/S-BIAD815/b2633930-86b0-489e-a845-d2a7afe6ff15/b2633930-86b0-489e-a845-d2a7afe6ff15.zarr > 604309.sql

$ psql -U omero -d idr -h 192.168.10.102 -f 604309.sql 
BEGIN
 mkngff_fileset 
----------------
        5287380
(1 row)
COMMIT

Try with another image from idr0051... with THIS branch... creating symlinks...

http://localhost:1040/webclient/?show=image-4007817 (Fileset 604305) https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/pages/S-BIAD815/c49efcfd-e767-4ae5-adbf-299cafd92120.html

# commit 36242c8a86b
pip install 'omero-mkngff @ git+https://github.com/will-moore/omero-mkngff@ignore_chunks'

omero mkngff sql --symlink_repo /data/OMERO/ManagedRepository --secret=$SECRET 604305 /bia-integrator-data/S-BIAD815/c49efcfd-e767-4ae5-adbf-299cafd92120/c49efcfd-e767-4ae5-adbf-299cafd92120.zarr > 604305.sql

$ psql -U omero -d idr -h 192.168.10.102 -f 604305.sql
BEGIN
 mkngff_fileset 
----------------
        5287383
(1 row)
COMMIT

This worked! MUCH faster without all the chunks. Only 13 files in the Fileset and image is viewable (on idr0125-pilot):

Screenshot 2023-08-23 at 13 50 57

Screenshot 2023-08-23 at 13 50 42

will-moore commented 12 months ago

Tested this PR at https://github.com/IDR/idr-metadata/issues/639#issuecomment-1690100998 with Plates from idr0035. Without chunks the sql commands were much faster (1 or 2 secs each) and the data looks good.

joshmoore commented 11 months ago

Assuming failure is still related to test-infra.