IDR / idr-metadata

Curated metadata for all studies published in the Image Data Resource

https://idr.openmicroscopy.org

14 stars 24 forks source link

idr-testing May 2024 #696

Open will-moore opened 6 months ago

will-moore commented 6 months ago

Steps needed on idr-next for NGFF upgrade. NB: current checklist is for actions on idr-testing (newly redeployed on 21st May 2024)

Detailed workflow is at https://github.com/IDR/mkngff_upgrade_scripts but this is an outline, also includes study-specific jobs:

Manual Software updates (should be part of the original deployment for idr-next):

[x] Update ZarrReader if needed to include recent work
[x] Install mkngff in venv, including recent PR branches (if not yet merged)
[x] Install latest iviewer 0.14.0

NGFF and other udpates:

[x] Add bfoptions file to skip_wells for all idr0009 Plates
[x] idr0070: remove annotations, delete svs images, re-import and re-annotate as described
[x] Create an OME-NGFF Tag and add the Screens and Projects
[x] As omero-server, clone https://github.com/IDR/mkngff_upgrade_scripts
[x] idr0015 - delete one duplicate plate named TARA_HCS1_H5_G100004727_G100004940--2013_12_08_21_26_28_chamber--U00--V01
[x] Get $SECRET, update all sql commands with it and run them (see https://github.com/IDR/mkngff_upgrade_scripts)
[x] Run mkngff symlink on all studies, including bfoptions creation
[x] Update bfoptions files for idr0004 plates P115 and P124
~Regenerate thumbnails for idr0015 plate https://idr.openmicroscopy.org/webclient/?show=plate-4653~
[ ] Generate thumbnails for idr0012 Plate 130-16, by going to an Image (e.g. http://localhost:1080/webclient/?show=image-3063425 and "Save To All"
[ ] Bio-Formats cache regeneration
[ ] Validation? E.g. https://github.com/IDR/idr-utils/pull/55 on some data?
[x] Move idr00XX.csv and sql files from https://github.com/IDR/idr-utils/pull/56 into separate repos, then review/merge the PR.

will-moore commented 6 months ago

Using today's build (23rd May) of OMEZarrReader: As omero-server... Ran on all 5 servers:

bash-5.1$ wget https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-build/76/label=testintegration/artifact/bio-formats-build/ZarrReader/target/OMEZarrReader-0.4.2-SNAPSHOT-jar-with-dependencies.jar

bash-5.1$ mv OMEZarrReader-0.4.2-SNAPSHOT-jar-with-dependencies.jar OMEZarrReader-b76.jar
bash-5.1$ rm OMERO.server/lib/client/OMEZarrReader.jar && rm OMERO.server/lib/server/OMEZarrReader.jar 
bash-5.1$ cp OMEZarrReader-b76.jar OMERO.server/lib/client/ && cp OMEZarrReader-b76.jar OMERO.server/lib/server/
exit
[wmoore@test122-omeroreadwrite ~]$ sudo service omero-server restart

Install mkngff on omeroreadwrite...

sudo -u root -s
source /opt/omero/server/venv3/bin/activate
pip install 'omero-mkngff @ git+https://github.com/IDR/omero-mkngff@main'
...
Resolved https://github.com/IDR/omero-mkngff to commit d3f95c90b379a8f61a866539cdb0e1e490fad84b
Successfully installed omero-mkngff-0.1.0.dev0

will-moore commented 6 months ago

$ sudo -u omero-server -s
bash-5.1$ cd
bash-5.1$ git clone https://github.com/IDR/mkngff_upgrade_scripts.git

Setup screen and delete duplicate plate for idr0015

screen -S mkngff
source /opt/omero/server/venv3/bin/activate
export OMERODIR=/opt/omero/server/OMERO.server
omero config get omero.db.host
export DBHOST=192.168.10.231
omero config get omero.db.pass
export PGPASSWORD=[********]

omero login...
omero delete Plate:4801 --report > /tmp/delete_idr0015.log

mkngff...

omero mkngff setup > setup.sql
psql -U omero -d idr -h $DBHOST -f setup.sql
CREATE FUNCTION

cd mkngff_upgrade_scripts/ngff_filesets/idr0004
for i in $(ls); do sed -i 's/SECRETUUID/42650434-6eaa-45e5-9542-58247a45d8bc/g' $i; done

omero login

cd ngff_filesets
export IDRID=idr0004
for r in $(cat $IDRID.csv); do
  biapath=$(echo $r | cut -d',' -f2)
  uuid=$(echo $biapath | cut -d'/' -f2)
  fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]')
  psql -U omero -d idr -h $DBHOST -f "$IDRID/$fsid.sql"
  omero mkngff symlink /data/OMERO/ManagedRepository $fsid "/bia-integrator-data/$biapath/$uuid.zarr" --bfoptions --clientpath="https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/$biapath/$uuid.zarr"
done

Fix idr0004 bfoptions (remove quick_read)

vi /data/OMERO/ManagedRepository/demo_2/2015-10/01/07-46-42.965_mkngff/35cfc0db-7795-497c-aed5-1ae591b2d9f1.zarr.bfoptions
vi /data/OMERO/ManagedRepository/demo_2/2015-10/01/07-57-40.271_mkngff/ee8872c8-e4b1-41fa-aa4f-a9e3e200c540.zarr.bfoptions

Repeat for all other studies...

will-moore commented 6 months ago

idr0010 (15:43...) idr0011 (17:32...) idrr012 (17:43...) idr0013 (17:57...) idr0016 (22:10...) idr0015 (06:41...) idr0025 8:07 idr0026 8:08 idr0033 9:25 idr0035 9:45 idr0036 9:49 idr0051 10:01 idr0054 10:02 idr0064 10:05 idr0090 10:07 idr0091 10:12

will-moore commented 6 months ago

BF cache memo generation...

$ ssh -A idr-testing.openmicroscopy.org -L 1080:omeroreadwrite:80
[wmoore@test122-proxy ~]$ grep -oE 'omero[^ ]+$' /etc/hosts > nodes
[wmoore@test122-proxy ~]$ cat nodes
omeroreadonly-1
omeroreadonly-2
omeroreadonly-3
omeroreadonly-4
omeroreadwrite

Delete (move) cache

ssh omeroreadwrite
sudo -u omero-server -s
cd /data/OMERO/BioFormatsCache/
mv data data_to_delete

Generate target IDs...

ssh omeroreadwrite
/opt/omero/server/OMERO.server/bin/omero login

/opt/omero/server/OMERO.server/bin/omero hql --limit -1 --ids-only --style csv 'SELECT MIN(field.image.id) FROM Screen as s JOIN s.plateLinks as plateLinks JOIN s.annotationLinks as annLinks join annLinks.child as ann join plateLinks.child as p join p.wells as w join w.wellSamples as field where  ann.id=38304992 GROUP BY field.well.plate' > ngff_plates.txt

/opt/omero/server/OMERO.server/bin/omero hql --limit -1 --ids-only --style csv 'SELECT d.id FROM Project as p JOIN p.datasetLinks as datasetLinks JOIN p.annotationLinks as annLinks join annLinks.child as ann join datasetLinks.child as d where  ann.id=38304992' > ngff_datasets.txt

vi ngff_plates.txt     # remove first row
vi ngff_datasets.txt     # remove first row

Back on proxy server... [wmoore@test122-proxy ~]

$ rsync -rvP omeroreadwrite:/home/wmoore/ngff_plates.txt ./
$ rsync -rvP omeroreadwrite:/home/wmoore/ngff_datasets.txt ./

$ cut -d ',' -f2 ngff_plates.txt | sed -e 's/^/Image:/' > ngff_ids.txt
$ cut -d ',' -f2 ngff_datasets.txt | sed -e 's/^/Dataset:/' >> ngff_ids.txt 
$ wc ngff_ids.txt 
 1643  1643 22938 ngff_ids.txt

$ screen -dmS cache parallel --eta --sshloginfile nodes -a ngff_ids.txt --results /tmp/ngff_cache_20240524/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

will-moore commented 6 months ago

Unfortunately it seems that the cache generation is silently failing. Nothing is produced under /tmp/ngff_cache_20240524/, and immediately running screen -r shows no screens.

Checked for which parallel and it isn't installed: Installed...

sudo -u root -s
dnf install parallel

Ran again..

screen -dmS cache parallel --eta --sshloginfile nodes -a ngff_ids.txt --results /tmp/ngff_cache_20240524/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

EDIT: (27th May) After running over the weekend didn't get many ok

[wmoore@test122-proxy 1]$ grep ok /tmp/ngff_cache_20240524/1/**/* | wc
     59     236    5927
[wmoore@test122-proxy ~]$ ls /tmp/ngff_cache_20240524/1/ | wc
   1643    1643   22938

Run again...

screen -dmS cache parallel --eta --sshloginfile nodes -a ngff_ids.txt --results /tmp/ngff_cache_20240527/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

28th May: ...after a day - screen -r reports no screen. Exactly three times more ok this time: (59 -> 177).

[wmoore@test122-proxy ~]$ grep ok /tmp/ngff_cache_20240527/1/**/* | wc
    177     795   19336

Check a random output...

[wmoore@test122-proxy ~]$ cat /tmp/ngff_cache_20240527/1/Image:9822101/stderr
wmoore@omeroreadonly-3: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

Not sure what's going on here. I checked I can do ssh omeroreadonly-3 etc to all the servers. Run again!....

On completion (after OME 2024 meeting): We are getting a few more "ok" results, but still way below success:

[wmoore@test122-proxy ~]$ grep ok /tmp/ngff_cache_20240528/1/**/* | wc
    230     946   23708

$ grep denied /tmp/ngff_cache_20240528/1/**/* 
/tmp/ngff_cache_20240528/1/Dataset:11901/stderr:wmoore@omeroreadwrite: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
/tmp/ngff_cache_20240528/1/Dataset:11902/stderr:wmoore@omeroreadonly-2: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
/tmp/ngff_cache_20240528/1/Dataset:11903/stderr:wmoore@omeroreadwrite: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
/tmp/ngff_cache_20240528/1/Dataset:11904/stderr:wmoore@omeroreadonly-2: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
/tmp/ngff_cache_20240528/1/Dataset:11905/stderr:wmoore@omeroreadwrite: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
/tmp/ngff_cache_20240528/1/Dataset:11906/stderr:wmoore@omeroreadonly-2: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
...

$ grep denied /tmp/ngff_cache_20240528/1/**/* | wc
   1436    5744  190228

Only see "Permission denied" with omeroreadwrite and omeroreadonly-2, not the others:

[wmoore@test122-proxy ~]$ grep denied /tmp/ngff_cache_20240528/1/**/* | grep "only-2" | wc
    742    2968   98652
[wmoore@test122-proxy ~]$ grep denied /tmp/ngff_cache_20240528/1/**/* | grep "readwrite" | wc
    694    2776   91576

On the 27th (previous run) we see "Permission denied" for all servers:

[wmoore@test122-proxy ~]$ grep denied /tmp/ngff_cache_20240527/1/**/* | grep "readwrite" | wc
    294    1176   38795
[wmoore@test122-proxy ~]$ grep denied /tmp/ngff_cache_20240527/1/**/* | grep "readonly-1" | wc
    163     652   21675
[wmoore@test122-proxy ~]$ grep denied /tmp/ngff_cache_20240527/1/**/* | grep "readonly-2" | wc
    409    1636   54377
[wmoore@test122-proxy ~]$ grep denied /tmp/ngff_cache_20240527/1/**/* | grep "readonly-3" | wc
    295    1180   39219
[wmoore@test122-proxy ~]$ grep denied /tmp/ngff_cache_20240527/1/**/* | grep "readonly-4" | wc
    324    1296   43080

will-moore commented 5 months ago

@sbesson Any ideas what's causing those Permission denied errors? They seem to only come from 2 servers last time I ran this (omeroreadwrite and omeroreadonly-2):

$ grep denied /tmp/ngff_cache_20240528/1/**/* 
/tmp/ngff_cache_20240528/1/Dataset:11901/stderr:wmoore@omeroreadwrite: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
/tmp/ngff_cache_20240528/1/Dataset:11902/stderr:wmoore@omeroreadonly-2: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
/tmp/ngff_cache_20240528/1/Dataset:11903/stderr:wmoore@omeroreadwrite: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
/tmp/ngff_cache_20240528/1/Dataset:11904/stderr:wmoore@omeroreadonly-2: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
/tmp/ngff_cache_20240528/1/Dataset:11905/stderr:wmoore@omeroreadwrite: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
/tmp/ngff_cache_20240528/1/Dataset:11906/stderr:wmoore@omeroreadonly-2: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

Running again now...

$ ssh -A idr-testing.openmicroscopy.org
Last login: Mon Jun  3 08:47:51 2024 from 134.36.66.49
[wmoore@test122-proxy ~]$ screen -dmS cache parallel --eta --sshloginfile nodes -a ngff_ids.txt --results /tmp/ngff_cache_20240603/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

EDIT: Didn't see any "Permission denied" errors this time, but plenty of others!

[wmoore@test122-proxy ~]$ grep rror /tmp/ngff_cache_20240603/1/**/* | wc
   1704   10341  231921
[wmoore@test122-proxy ~]$ grep ResourceError /tmp/ngff_cache_20240603/1/**/* | wc
   1669   10014  225473
[wmoore@test122-proxy ~]$ grep FileNotFoundError /tmp/ngff_cache_20240603/1/**/* | wc
     29     261    5626

But checking these in the webclient either viewed fine or showed spinner (no memo file)..?

Except for idr0064... Still haven't updated https://github.com/IDR/mkngff_upgrade_scripts/blob/main/ngff_filesets/idr0064.csv with original scripts for updating vanilla Filesets to data at BioImage Archive. See https://github.com/IDR/idr-metadata/issues/682#issuecomment-1895755941

will-moore commented 5 months ago

Another attempt (5th)...

$ ssh -A idr-testing.openmicroscopy.org -L 1080:omeroreadwrite:80

[wmoore@test122-proxy ~]$ screen -dmS cache parallel --eta --sshloginfile nodes -a ngff_ids.txt --results /tmp/ngff_cache_20240604/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

EDIT: checking back a few days later - looks better: NB use ok: to avoid matching invoke(:

[wmoore@test122-proxy ~]$ grep "ok:" /tmp/ngff_cache_20240604/1/**/* | wc
   1829    7316  188495

But actually, most of these are Dataset images. If we exclude those, find only 239 Plate images are ok out of 1599

[wmoore@test122-proxy ~]$ grep "ok:" /tmp/ngff_cache_20240604/1/**/* | grep -v Dataset | wc
    239     956   24012
[wmoore@test122-proxy ~]$ grep Image ngff_ids.txt | wc
   1599    1599   22362

Also lots of ResourceErrors! 1346 (out of 1599 Plates), and 461 for Dataset Images

[wmoore@test122-proxy ~]$ grep "ResourceError" /tmp/ngff_cache_20240604/1/**/* | wc
   1807   10842  245921

[wmoore@test122-proxy ~]$ grep "ResourceError" /tmp/ngff_cache_20240604/1/**/* | grep Dataset | wc
    461    2766   63114
[wmoore@test122-proxy ~]$ grep "ResourceError" /tmp/ngff_cache_20240604/1/**/* | grep -v Dataset | wc
   1346    8076  182807

Many images with ResourceError are viewable in webclient. Picking an Image that isn't (idr0004, plate 171, http://localhost:1080/webclient/?show=image-698831), look for fileset name in logs on all servers...

for server in omeroreadwrite omeroreadonly-1 omeroreadonly-2 omeroreadonly-3 omeroreadonly-4; do echo $server && ssh $server "grep e3283a6a-d25b-41e1-8ab7-1837b89e3a6e /opt/omero/server/OMERO.server/var/log/Blitz-0.log"; done

Finds entries in logs on omeroreadwrite, including

2024-06-11 07:43:30,171 DEBUG [                   loci.formats.Memoizer] (.Server-10) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2015-10/01/08-51-39.709_mkngff/e3283a6a-d25b-41e1-8ab7-1837b89e3a6e.zarr/..zattrs.bfmemo (128925 bytes)

But No Errors with this search.

Also found 14 sessions errors:

[wmoore@test122-proxy ~]$ grep "rror" /tmp/ngff_cache_20240604/1/**/* | grep -v ResourceError | grep sessions
/tmp/ngff_cache_20240604/1/Image:12550005/stderr:FileNotFoundError: [Errno 2] No such file or directory: path('/home/wmoore/omero/sessions/localhost/public/2f9ce65b-a180-4828-a64b-f2c08a8f6743')
/tmp/ngff_cache_20240604/1/Image:1486790/stderr:FileNotFoundError: [Errno 2] No such file or directory: path('/home/wmoore/omero/sessions/localhost/public/2f4eff31-ac3c-4f91-bc7b-e8464463f3a7')
...

Found 708 errors like:

/tmp/ngff_cache_20240604/1/Image:9821901/stderr:!! 06/04/24 09:49:07.402 error: 9 communicators not destroyed during global destruction.

will-moore commented 5 months ago

@sbesson - updated the previous comment above with more details of errors on the last attempt at memo file generation for NGFF data (1599 Plate Images and 44 Datasets).

Summay:

1807 ResourceErrors but don't know what caused them. Only 239 out of 1599 NGFF Plates are "ok" in memo generation logs, but data seems to be OK when viewed in webclient.
14 No such file or directory: path('/home/wmoore/omero/sessions/localhost/public/...
708 communicators not destroyed

will-moore commented 5 months ago

From Seb: omeroreadonly-4 I am seeing lots of [java.io](http://java.io/).FileNotFoundException...(Transport endpoint is not connected) in the logs. And

[sbesson@test122-omeroreadonly-4 ~]$ ls /bia-integrator-data 
ls: cannot access '/bia-integrator-data': Transport endpoint is not connected

Fixed by unmouting and re-mounting goofys:

[wmoore@test122-omeroreadonly-4 ~]$ sudo umount /bia-integrator-data
[wmoore@test122-omeroreadonly-4 ~]$ sudo goofys --endpoint https://uk1s3.embassy.ebi.ac.uk/ -o allow_other bia-integrator-data /bia-integrator-data
[wmoore@test122-omeroreadonly-4 ~]$ ls /bia-integrator-data/S-BIAD815/51afff7c-eed4-44b4-95c7-1437d8807b97/51afff7c-eed4-44b4-95c7-1437d8807b97.zarr
0  OME

Also check this is mounted on all servers:

[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "ls /bia-integrator-data/S-BIAD815/51afff7c-eed4-44b4-95c7-1437d8807b97/51afff7c-eed4-44b4-95c7-1437d8807b97.zarr"; done
omeroreadonly-1
0
OME
omeroreadonly-2
0
OME
omeroreadonly-3
0
OME
omeroreadonly-4
0
OME
omeroreadwrite
0
OME

will-moore commented 5 months ago

6th attempt:

screen -dmS cache parallel --eta --sshloginfile nodes -a ngff_ids.txt --results /tmp/ngff_cache_20240611/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

Immediately there are a bunch of Image:XX dirs:

[wmoore@test122-proxy ~]$ ls /tmp/ngff_cache_20240611/1 | wc
    324     324    4535

A bunch of sessions errors e.g.

/tmp/ngff_cache_20240611/1/Image:3415781/stderr:FileNotFoundError: [Errno 2] No such file or directory: path('/home/wmoore/omero/sessions/localhost/public/f11b34f2-a63c-4926-a207-ec423209f5cf')

$ grep "FileNotFoundError" /tmp/ngff_cache_20240611/1/**/* | wc
     27     243    5239

And some "communicators not destroyed"

/tmp/ngff_cache_20240611/1/Image:1484119/stderr:!! 06/11/24 21:16:15.526 error: 9 communicators not destroyed during global destruction.

$ grep "rror" /tmp/ngff_cache_20240611/1/**/* | grep communicators | wc
     17     187    2328

No other errors initially... Overnight, 2 other ResourceErrors:

[wmoore@test122-proxy ~]$ grep "rror" /tmp/ngff_cache_20240611/1/**/* | grep -v communicators | grep -v FileNotFoundError
/tmp/ngff_cache_20240611/1/Image:1556033/stdout:fail: Pixels:1556033 Image:1556033 11470.436125516891 exception ::omero::ResourceError
/tmp/ngff_cache_20240611/1/Image:1573071/stdout:fail: Pixels:1573071 Image:1573071 10738.076765537262 exception ::omero::ResourceError

Both of these are from idr0013 and are NOT NGFF data! Fileset:18568 and Fileset:18655 are the 2nd and 3rd from last rows in idr0013.csv. Last row is 18728 which also is not NGFF converted. But checking earlier in idr0013.csv e.g. 4th from end and earlier, finds no unconverted Filesets, so it looks like just the last 3 failed?

Checking progress (12th June)... It looks like the process is not running. Only 311 "ok" and still no more errors.

[wmoore@test122-proxy ~]$ screen -r
There is no screen to be resumed.

[wmoore@test122-proxy ~]$ ls /tmp/ngff_cache_20240611/1/ | wc
   1643    1643   22938

[wmoore@test122-proxy ~]$ grep "ok:" /tmp/ngff_cache_20240611/1/**/* | wc
    311    1244   31235

# /tmp/ngff_cache_20240611/1/Image:696324/stdout:Previous session expired for public on localhost:4064
[wmoore@test122-proxy ~]$ grep "session expired" /tmp/ngff_cache_20240611/1/**/* | wc
     70     490    7142

Previous errors in stdout:

[wmoore@test122-proxy ~]$ cat /tmp/ngff_cache_20240611/1/**/stdout | grep -v "ok:" | grep -v "session expired"
fail: Pixels:1556033 Image:1556033 11470.436125516891 exception ::omero::ResourceError
fail: Pixels:1573071 Image:1573071 10738.076765537262 exception ::omero::ResourceError

But lots more in stderr - errors (with counts)


cat /tmp/ngff_cache_20240611/1/**/stderr
...
wmoore@omeroreadonly-3: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

[wmoore@test122-proxy ~]$ grep "wmoore@omeroreadonly-3: Permission denied (publickey,gssapi-keyex,gssapi-with-mic)" /tmp/ngff_cache_20240611/1/**/stderr | wc
    492    1968   65421
[wmoore@test122-proxy ~]$ grep "wmoore@omeroreadwrite: Permission denied (publickey,gssapi-keyex,gssapi-with-mic)" /tmp/ngff_cache_20240611/1/**/stderr | wc
    495    1980   65317
[wmoore@test122-proxy ~]$ grep "Password check failed" /tmp/ngff_cache_20240611/1/**/stderr | wc
     27     243    3511

will-moore commented 5 months ago

session timeout

Let's try to log-in and set sessions timeout before running parallels

[wmoore@test122-proxy ~]$ for server in $(cat nodes); do ssh $server "/opt/omero/server/OMERO.server/bin/omero login -s localhost -u public -w public && /opt/omero/server/OMERO.server/bin/omero sessions timeout 3600"; done
Created session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
Previous session expired for public on localhost:4064
Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
cannot update timeout for 81a63a9a-1513-4a3a-95b0-0599b6c46d1a
Created session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
Previous session expired for public on localhost:4064
Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
cannot update timeout for bda1ab04-ea1c-4f39-838b-00bffe7795e8
Created session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
Previous session expired for public on localhost:4064
Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
cannot update timeout for 53ec8eb4-e947-49a2-b684-5b89ee8b02e3
Created session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
Previous session expired for public on localhost:4064
Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
cannot update timeout for 2480e613-d6da-4fbd-93d7-aa1974979cbf
Created session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
Previous session expired for public on localhost:4064
Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public

Hmmm - oh well....

Give another try anyway...

screen -dmS cache parallel --eta --sshloginfile nodes -a ngff_ids.txt --results /tmp/ngff_cache_20240613/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

After few hours overnight - no errors except know 2 above - no FileNotFound or communicators not destroyed.

[wmoore@test122-proxy ~]$ grep "rror" /tmp/ngff_cache_20240613/1/**/*
/tmp/ngff_cache_20240613/1/Image:1556033/stdout:fail: Pixels:1556033 Image:1556033 1933.3922345638275 exception ::omero::ResourceError
/tmp/ngff_cache_20240613/1/Image:1573071/stdout:fail: Pixels:1573071 Image:1573071 1944.575980424881 exception ::omero::ResourceError

Some timeouts have changed to 60 min, approx 1/5th (none were 60 mins before):

$ grep timeout /tmp/ngff_cache_20240613/1/**/*
...
/tmp/ngff_cache_20240613/1/Image:692360/stderr:Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
/tmp/ngff_cache_20240613/1/Image:693716/stderr:Reconnected to session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
/tmp/ngff_cache_20240613/1/Image:696271/stderr:Using session for public@localhost:4064. Idle timeout: 60 min. Current group: Public
/tmp/ngff_cache_20240613/1/Image:696324/stderr:Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
/tmp/ngff_cache_20240613/1/Image:696797/stderr:Using session for public@localhost:4064. Idle timeout: 60 min. Current group: Public
/tmp/ngff_cache_20240613/1/Image:9822101/stderr:Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
[wmoore@test122-proxy ~]$ grep timeout /tmp/ngff_cache_20240613/1/**/* | wc
    326    3612   43591
[wmoore@test122-proxy ~]$ grep "timeout: 60" /tmp/ngff_cache_20240613/1/**/* | wc
     72     792    9573

Some "ok:", similar number as before (and NONE of these are Dataset Images). Still small portion of 1599 total Plates!

[wmoore@test122-proxy ~]$ grep "ok:" /tmp/ngff_cache_20240613/1/**/* | wc
    324    1296   32546

Lots of these - equally distributed between all 5 servers:

/tmp/ngff_cache_20240613/1/Image:9821951/stderr:wmoore@omeroreadonly-3: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
/tmp/ngff_cache_20240613/1/Image:9822001/stderr:wmoore@omeroreadonly-1: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
/tmp/ngff_cache_20240613/1/Image:9822051/stderr:wmoore@omeroreadonly-3: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

[wmoore@test122-proxy ~]$ grep "Permission denied" /tmp/ngff_cache_20240613/1/**/* | wc
   1269    5076  168461
[wmoore@test122-proxy ~]$ grep "Permission denied" /tmp/ngff_cache_20240613/1/**/* | grep "readwrite" | wc
    254    1016   33512
[wmoore@test122-proxy ~]$ grep "Permission denied" /tmp/ngff_cache_20240613/1/**/* | grep "readonly-1" | wc
    254    1016   33774
[wmoore@test122-proxy ~]$ grep "Permission denied" /tmp/ngff_cache_20240613/1/**/* | grep "readonly-2" | wc
    254    1016   33767
[wmoore@test122-proxy ~]$ grep "Permission denied" /tmp/ngff_cache_20240613/1/**/* | grep "readonly-3" | wc
    253    1012   33638
[wmoore@test122-proxy ~]$ grep "Permission denied" /tmp/ngff_cache_20240613/1/**/* | grep "readonly-4" | wc
    254    1016   33770

sbesson commented 5 months ago

Looking at the logs for the source of these Permission denied issues, I think the relevant events are

[sbesson@test122-omeroreadwrite ~]$ sudo grep -B 10 -A 10 fatal /var/log/secure-20240616 
Jun 11 21:30:56 test122-omeroreadwrite sshd[1335505]: Accepted publickey for wmoore from 192.168.2.190 port 32888 ssh2: RSA SHA256:Nurjq5im8jid017paB3gXF7nz31bCaoY0W9UImvhwX8
Jun 11 21:30:56 test122-omeroreadwrite sshd[1335505]: pam_unix(sshd:session): session opened for user wmoore(uid=5098) by wmoore(uid=0)
Jun 11 21:46:35 test122-omeroreadwrite sshd[1335130]: Received disconnect from 192.168.2.190 port 55788:11: disconnected by user
Jun 11 21:46:35 test122-omeroreadwrite sshd[1335130]: Disconnected from user wmoore 192.168.2.190 port 55788
Jun 11 21:46:35 test122-omeroreadwrite sshd[1335127]: pam_unix(sshd:session): session closed for user wmoore
Jun 11 21:46:36 test122-omeroreadwrite sshd[1336220]: Accepted publickey for wmoore from 192.168.2.190 port 35256 ssh2: RSA SHA256:Nurjq5im8jid017paB3gXF7nz31bCaoY0W9UImvhwX8
Jun 11 21:46:36 test122-omeroreadwrite sshd[1336220]: pam_unix(sshd:session): session opened for user wmoore(uid=5098) by wmoore(uid=0)
Jun 11 22:15:16 test122-omeroreadwrite sshd[1334064]: Received disconnect from 192.168.2.190 port 37106:11: disconnected by user
Jun 11 22:15:16 test122-omeroreadwrite sshd[1334064]: Disconnected from user wmoore 192.168.2.190 port 37106
Jun 11 22:15:16 test122-omeroreadwrite sshd[1334018]: pam_unix(sshd:session): session closed for user wmoore
Jun 11 22:17:16 test122-omeroreadwrite sshd[1337471]: fatal: Timeout before authentication for 192.168.2.190 port 41974
Jun 11 22:21:59 test122-omeroreadwrite sshd[1337726]: Connection closed by authenticating user wmoore 192.168.2.190 port 52128 [preauth]
Jun 11 22:21:59 test122-omeroreadwrite sshd[1337728]: Connection closed by authenticating user wmoore 192.168.2.190 port 52134 [preauth]
Jun 11 22:21:59 test122-omeroreadwrite sshd[1337730]: Connection closed by authenticating user wmoore 192.168.2.190 port 52146 [preauth]
Jun 11 22:21:59 test122-omeroreadwrite sshd[1337732]: Connection closed by authenticating user wmoore 192.168.2.190 port 52160 [preauth]
Jun 11 22:21:59 test122-omeroreadwrite sshd[1337734]: Connection closed by authenticating user wmoore 192.168.2.190 port 52166 [preauth]
Jun 11 22:22:00 test122-omeroreadwrite sshd[1337736]: Connection closed by authenticating user wmoore 192.168.2.190 port 52174 [preauth]
Jun 11 22:22:00 test122-omeroreadwrite sshd[1337738]: Connection closed by authenticating user wmoore 192.168.2.190 port 53496 [preauth]
Jun 11 22:22:01 test122-omeroreadwrite sshd[1337740]: Connection closed by authenticating user wmoore 192.168.2.190 port 53502 [preauth]
Jun 11 22:22:01 test122-omeroreadwrite sshd[1337742]: Connection closed by authenticating user wmoore 192.168.2.190 port 53514 [preauth]
Jun 11 22:22:01 test122-omeroreadwrite sshd[1337744]: Connection closed by authenticating user wmoore 192.168.2.190 port 53518 [preauth]
--
Jun 13 22:14:54 test122-omeroreadwrite sshd[1453987]: pam_unix(sshd:session): session closed for user wmoore
Jun 13 22:14:55 test122-omeroreadwrite sshd[1454271]: Received disconnect from 192.168.2.190 port 34842:11: disconnected by user
Jun 13 22:14:55 test122-omeroreadwrite sshd[1454271]: Disconnected from user wmoore 192.168.2.190 port 34842
Jun 13 22:14:55 test122-omeroreadwrite sshd[1454200]: pam_unix(sshd:session): session closed for user wmoore
Jun 13 22:14:55 test122-omeroreadwrite sshd[1454219]: Received disconnect from 192.168.2.190 port 34816:11: disconnected by user
Jun 13 22:14:55 test122-omeroreadwrite sshd[1454219]: Disconnected from user wmoore 192.168.2.190 port 34816
Jun 13 22:14:55 test122-omeroreadwrite sshd[1454197]: pam_unix(sshd:session): session closed for user wmoore
Jun 13 22:14:55 test122-omeroreadwrite sshd[1454244]: Received disconnect from 192.168.2.190 port 34832:11: disconnected by user
Jun 13 22:14:55 test122-omeroreadwrite sshd[1454244]: Disconnected from user wmoore 192.168.2.190 port 34832
Jun 13 22:14:55 test122-omeroreadwrite sshd[1454199]: pam_unix(sshd:session): session closed for user wmoore
Jun 13 22:16:52 test122-omeroreadwrite sshd[1454272]: fatal: Timeout before authentication for 192.168.2.190 port 34850
Jun 13 22:16:52 test122-omeroreadwrite sshd[1454298]: fatal: Timeout before authentication for 192.168.2.190 port 34854
Jun 13 22:16:53 test122-omeroreadwrite sshd[1454383]: fatal: Timeout before authentication for 192.168.2.190 port 34868
Jun 13 22:16:54 test122-omeroreadwrite sshd[1454386]: fatal: Timeout before authentication for 192.168.2.190 port 34872
Jun 13 22:16:54 test122-omeroreadwrite sshd[1454388]: fatal: Timeout before authentication for 192.168.2.190 port 34884
Jun 13 22:16:54 test122-omeroreadwrite sshd[1454390]: fatal: Timeout before authentication for 192.168.2.190 port 34886
Jun 13 22:16:54 test122-omeroreadwrite sshd[1454392]: fatal: Timeout before authentication for 192.168.2.190 port 34888
Jun 13 22:16:55 test122-omeroreadwrite sshd[1454394]: fatal: Timeout before authentication for 192.168.2.190 port 34892
Jun 13 22:16:55 test122-omeroreadwrite sshd[1454397]: fatal: Timeout before authentication for 192.168.2.190 port 34896
Jun 13 22:16:55 test122-omeroreadwrite sshd[1454399]: fatal: Timeout before authentication for 192.168.2.190 port 34902
Jun 13 22:19:22 test122-omeroreadwrite sshd[1454576]: Connection closed by authenticating user wmoore 192.168.2.190 port 43110 [preauth]
Jun 13 22:19:22 test122-omeroreadwrite sshd[1454578]: Connection closed by authenticating user wmoore 192.168.2.190 port 43120 [preauth]
Jun 13 22:19:22 test122-omeroreadwrite sshd[1454580]: Connection closed by authenticating user wmoore 192.168.2.190 port 43130 [preauth]
Jun 13 22:19:22 test122-omeroreadwrite sshd[1454582]: Connection closed by authenticating user wmoore 192.168.2.190 port 43144 [preauth]
Jun 13 22:19:22 test122-omeroreadwrite sshd[1454584]: Connection closed by authenticating user wmoore 192.168.2.190 port 43146 [preauth]
Jun 13 22:19:22 test122-omeroreadwrite sshd[1454586]: Connection closed by authenticating user wmoore 192.168.2.190 port 43148 [preauth]
Jun 13 22:19:22 test122-omeroreadwrite sshd[1454588]: Connection closed by authenticating user wmoore 192.168.2.190 port 43152 [preauth]
Jun 13 22:19:22 test122-omeroreadwrite sshd[1454590]: Connection closed by authenticating user wmoore 192.168.2.190 port 43160 [preauth]
Jun 13 22:19:22 test122-omeroreadwrite sshd[1454591]: Connection closed by authenticating user wmoore 192.168.2.190 port 43172 [preauth]
Jun 13 22:19:22 test122-omeroreadwrite sshd[1454593]: Connection closed by authenticating user wmoore 192.168.2.190 port 43178 [preauth]

But I haven't found yet a good explanation for what could cause these timeouts breaking the SSH connection

will-moore commented 5 months ago

As omero-server, updated OMEZarrReader on all 5 idr-testing servers:

wget https://artifacts.openmicroscopy.org/artifactory/ome.releases/ome/OMEZarrReader/0.5.1/OMEZarrReader-0.5.1.jar
rm OMERO.server/lib/client/OMEZarrReader-b76.jar && rm OMERO.server/lib/server/OMEZarrReader-b76.jar 
cp OMEZarrReader-0.5.1.jar OMERO.server/lib/client/ && cp OMEZarrReader-0.5.1.jar OMERO.server/lib/server/
exit
$ sudo service omero-server restart

However, trying to view images in web (connected to omeroreadwrite:80) gave errors suggesting we're missing dependencies:

    serverExceptionClass = ome.conditions.InternalException
    message =  Wrapped Exception: (java.lang.NoClassDefFoundError):
com/amazonaws/services/s3/model/S3ObjectInputStream

Reverting back to latest daily build...

wget https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-build/101/label=testintegration/artifact/bio-formats-build/ZarrReader/target/OMEZarrReader-0.5.2-SNAPSHOT-jar-with-dependencies.jar
mv OMEZarrReader-0.5.2-SNAPSHOT-jar-with-dependencies.jar OMEZarrReader-0.5.2_b101.jar
rm OMERO.server/lib/client/OMEZarrReader-0.5.1.jar && rm OMERO.server/lib/server/OMEZarrReader-0.5.1.jar
cp OMEZarrReader-0.5.2_b101.jar OMERO.server/lib/client/ && cp OMEZarrReader-0.5.2_b101.jar OMERO.server/lib/server/

will-moore commented 5 months ago

Yesterday: To avoid sshloginfile issues seen above, I manually split ngff_ids.txt (1643 rows) into blocks of 350 rows on each of the omeroreadonly servers (omeroreadwrite I just started running with existing ngff_plates.txt to cover the first 350+ rows). omeroreadonly-1: rows 351-700 omeroreadonly-2: rows 701-1050 omeroreadonly-3: rows 1051-1400 omeroreadonly-4: rows 1401-1643

ssh to each server in turn and ran

screen -dmS cache parallel --eta -a ngff_ids.txt --results /tmp/ngff_cache_20240617/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

Later yesterday:

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240617/1/**/* | wc"; done
omeroreadonly-1
     19      76    1908
omeroreadonly-2
     20      80    2005
omeroreadonly-3
     20      80    2000
omeroreadonly-4
     21      84    2104
omeroreadwrite
    362    1448   36445

Today:

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240617/1/**/* | wc"; done
omeroreadonly-1
     43     172    4322
omeroreadonly-2
     39     156    3910
omeroreadonly-3
     50     200    5014
omeroreadonly-4
     41     164    4105
omeroreadwrite
    389    1556   39158

will-moore commented 5 months ago

Found ResourceError for idr0015 plates: Fileset:21118 (not NGFF converted). Corresponds to 3 plates missing from idr0015.csv: https://github.com/IDR/idr-metadata/issues/645#issuecomment-1849888173 Need to run these 3 as described there and and to idr0015.csv.

Created idr0015a on idr-testing as described there, then... etc

cd idr0015
for i in $(ls); do sed -i 's/42650434-6eaa-45e5-9542-58247a45d8bc/7eb6e556-9b77-45b6-b82b-c913cb85904e/g' $i; done
cd ../

(venv3) bash-5.1$ for r in $(cat idr0015a.csv); do
>   biapath=$(echo $r | cut -d',' -f2)
>   uuid=$(echo $biapath | cut -d'/' -f2)
>   fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]')
>   psql -U omero -d idr -h $DBHOST -f "$IDRID/$fsid.sql"
>   omero mkngff symlink /data/OMERO/ManagedRepository $fsid "/bia-integrator-data/$biapath/$uuid.zarr" --bfoptions --clientpath="https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/$biapath/$uuid.zarr"
> done
UPDATE 396
BEGIN
 mkngff_fileset 
----------------
        6321119
(1 row)

COMMIT
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-06/06/21-26-25.533
Creating dir at /data/OMERO/ManagedRepository/demo_2/2016-06/06/21-26-25.533_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/2016-06/06/21-26-25.533_mkngff/d69df538-4684-4b32-8ded-d2f2af43af9f.zarr -> /bia-integrator-data/S-BIAD861/d69df538-4684-4b32-8ded-d2f2af43af9f/d69df538-4684-4b32-8ded-d2f2af43af9f.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-06/06/21-26-25.533
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/2016-06/06/21-26-25.533_mkngff/d69df538-4684-4b32-8ded-d2f2af43af9f.zarr.bfoptions
UPDATE 396
BEGIN
 mkngff_fileset 
----------------
        6321120
(1 row)

COMMIT
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-06/10/13-37-45.953
Creating dir at /data/OMERO/ManagedRepository/demo_2/2016-06/10/13-37-45.953_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/2016-06/10/13-37-45.953_mkngff/0cc5dbe3-444a-4ea2-a335-b51cf89c1c53.zarr -> /bia-integrator-data/S-BIAD861/0cc5dbe3-444a-4ea2-a335-b51cf89c1c53/0cc5dbe3-444a-4ea2-a335-b51cf89c1c53.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-06/10/13-37-45.953
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/2016-06/10/13-37-45.953_mkngff/0cc5dbe3-444a-4ea2-a335-b51cf89c1c53.zarr.bfoptions
UPDATE 396
BEGIN
 mkngff_fileset 
----------------
        6321121
(1 row)

COMMIT
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-06/06/00-58-20.828
Creating dir at /data/OMERO/ManagedRepository/demo_2/2016-06/06/00-58-20.828_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/2016-06/06/00-58-20.828_mkngff/1a29207c-d50b-48b7-a7c0-54c6252bfd9c.zarr -> /bia-integrator-data/S-BIAD861/1a29207c-d50b-48b7-a7c0-54c6252bfd9c/1a29207c-d50b-48b7-a7c0-54c6252bfd9c.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-06/06/00-58-20.828
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/2016-06/06/00-58-20.828_mkngff/1a29207c-d50b-48b7-a7c0-54c6252bfd9c.zarr.bfoptions

will-moore commented 5 months ago

Also did the same for idr0013 last 3 plates of idr0013.csv not converted (see above https://github.com/IDR/idr-metadata/issues/696#issuecomment-2161654156) These initially failed to run as the SESSION wasn't in the original sql scripts (so it didn't get correctly updated prior to running). Fixed in https://github.com/IDR/mkngff_upgrade_scripts/commit/be59fe98b57229f7f314162cd707be2551e1b483

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'rror' /tmp/ngff_cache_20240617/1/**/*"; done
omeroreadonly-1
/tmp/ngff_cache_20240617/1/Image:1964831/stdout:fail: Pixels:1964831 Image:1964831 1.9138693809509277 exception ::omero::ResourceError
/tmp/ngff_cache_20240617/1/Image:3103076/stdout:fail: Pixels:3103076 Image:3103076 16301.51702260971 exception ::omero::ResourceError
omeroreadonly-2
/tmp/ngff_cache_20240617/1/Image:3073892/stdout:fill: Pixels:3073892 Image:3073892 17484.479704618454 exception ::omero::ResourceError
omeroreadonly-3
/tmp/ngff_cache_20240617/1/Image:1962455/stdout:fail: Pixels:1962455 Image:1962455 2.024498224258423 exception ::omero::ResourceError
/tmp/ngff_cache_20240617/1/Image:2850468/stderr:!! 06/17/24 12:46:26.823 error: 4 communicators not destroyed during global destruction.
/tmp/ngff_cache_20240617/1/Image:2857814/stderr:!! 06/17/24 12:41:54.395 error: 2 communicators not destroyed during global destruction.
/tmp/ngff_cache_20240617/1/Image:3367911/stderr:!! 06/17/24 21:46:36.942 error: 2 communicators not destroyed during global destruction.
omeroreadonly-4
/tmp/ngff_cache_20240617/1/Image:1640587/stderr:!! 06/17/24 14:15:58.051 error: 6 communicators not destroyed during global destruction.
/tmp/ngff_cache_20240617/1/Image:2857533/stderr:!! 06/17/24 12:49:00.746 error: 6 communicators not destroyed during global destruction.
omeroreadwrite
/tmp/ngff_cache_20240617/1/Image:12546037/stderr:!! 06/17/24 12:17:22.525 error: 9 communicators not destroyed during global destruction.
/tmp/ngff_cache_20240617/1/Image:12550677/stderr:!! 06/17/24 12:17:53.251 error: 9 communicators not destroyed during global destruction.
...
/tmp/ngff_cache_20240617/1/Image:1556033/stdout:fail: Pixels:1556033 Image:1556033 12244.128482341766 exception ::omero::ResourceError
/tmp/ngff_cache_20240617/1/Image:1573071/stdout:fail: Pixels:1573071 Image:1573071 10876.942811727524 exception ::omero::ResourceError

Checked all these Images - All are viewable in webclient except for idr0015 and idr0013 Plates fixed above - viewed to trigger memo file... omeroreadwrite has lots of " communicators not destroyed" - Just checked ResourceErrors...

will-moore commented 5 months ago

Progress: current state...

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240617/1/**/* | wc"; done
omeroreadonly-1
     89     356    8945
omeroreadonly-2
     83     332    8328
omeroreadonly-3
     92     368    9232
omeroreadonly-4
     89     356    8919
omeroreadwrite
    437    1748   43980

Checking progress in webclient:

idr0004 - all done
idr0010 - done up to plate 49-06
idr0011 - ScreenA
...
idr0064 - all done

will-moore commented 5 months ago

Need better way to monitor completed memo filesets.

e.g.

(venv3) (base) [wmoore@pilot-idr0138-omeroreadwrite scripts]$ /opt/omero/server/OMERO.server/bin/omero hql --limit -1 --ids-only --style csv 'select MIN(field.image.id) FROM WellSample AS field where field.well.plate.id in (select link.child.id from ScreenPlateLink as link where link.parent.id=2851) GROUP BY field.well.plate' > idr0090_plates.txt

cut -d ',' -f2 idr0090_plates.txt | sed -e 's/^/Image:/' > idr0090_ids.txt

On proxy server, make a list of all "ok" logs:

[wmoore@test122-proxy ~]$ rsync -rvP omeroreadwrite:/home/wmoore/idr0010_ids.txt ./

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240617/1/**/*" >> ok_ids.txt; done
[wmoore@test122-proxy ~]$ wc ok_ids.txt 
  842  3368 84631 ok_ids.txt

E.g. idr0010, check with Images are in ok logs - 80 / 148 are ok:

$ for i in $(cat idr0010_ids.txt); do echo $i && grep $i ok_ids.txt >> idr0010_ok.txt; done
$ wc idr0010_ok.txt 
  80  320 8036 idr0010_ok.txt
$ wc idr0010_ids.txt 
 148  148 2072 idr0010_ids.txt

for idr in idr0011a idr0011b idr0011c idr0011d idr0012 idr0013a idr0013b idr0015 idr0016 idr0025 idr0033 idr0035 idr0036 idr0064; do
echo $(echo $idr)_plates.txt && cut -d ',' -f2 $(echo $idr)_plates.txt | sed -e 's/^/Image:/' > $(echo $idr)_ids.txt;
done

for idr in idr0011a idr0011b idr0011c idr0011d idr0012 idr0013a idr0013b idr0015 idr0016 idr0025 idr0033 idr0035 idr0036 idr0064; do
echo $(echo $idr)_ids.txt && rsync -rvP omeroreadwrite:/home/wmoore/$(echo $idr)_ids.txt ./
done

idr0010 80/148
idr0011a 81/129
idr0011b 18/40
idr0011c 3/4
idr0011d 4/8
idr0012 38/68
idr0013a 283/510
idr0013b 9/38
idr0015 36/83
idr0016 112/413
idr0025 0/3
idr0033 5/12
idr0035 35/55
idr0064 5/9

will-moore commented 5 months ago

Updating iviewer to 0.14.0:

for server in omeroreadwrite omeroreadonly-1 omeroreadonly-2 omeroreadonly-3 omeroreadonly-4; do ssh $server "sudo /opt/omero/web/venv3/bin/pip uninstall -y omero-iviewer && sudo /opt/omero/web/venv3/bin/pip install omero-iviewer==0.14.0 && sudo service omero-web restart"; done

will-moore commented 5 months ago

Team testing on idr-testing with microservices today was "slower than expected" with a few odd image rendering failures. I didn't stop the memo generation before testing, which probably would have helped. Also tricky to test with incomplete memo file generation.

Now, stopped memo generation and will target the completion of individual studies...

Running individual Screens on different servers like this:

[wmoore@test122-omeroreadwrite ~]$ screen -dmS cache parallel --eta -a idr0004_ids.txt --results /tmp/ngff_cache_20240621_idr0004/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

omeroreadonly-1 do idr0010 (148)
omeroreadonly-2 do idr0011a (129)
omeroreadonly-3 do idr0012 (68)
omeroreadonly-3 do idr0015 (83)
omeroreadwrite do idr0004: (46)

After a couple of minutes we have:

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240621_*/1/**/* | wc"; done
omeroreadonly-1
     29     116    3141
omeroreadonly-2
     36     144    3941
omeroreadonly-3
     21      84    2280
omeroreadonly-4
     23      92    2493
omeroreadwrite
     10      40    1053

But also lots of

stderr:FileNotFoundError: [Errno 2] No such file or directory: path('/home/wmoore/omero/sessions/localhost/public/e3906e3b-0282-482e-bd2c-c363ae0f1f79')

On omeroreadwrite, I cancelled the screen, deleted log dir and re-ran, hoping that existing session would avoid those errors - which seems to work. Got 19 initial "ok" in first minute and no FileNotFoundError errors.

Error counts are now:

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'rror' /tmp/ngff_cache_20240621_*/1/**/* | wc"; done
omeroreadonly-1
      9      81    1818
omeroreadonly-2
      9      81    1827
omeroreadonly-3
     11      97    2067
omeroreadonly-4
      9      81    1818
omeroreadwrite
      0       0       0

9-fewer errors for omeroreadwrite and 9 more oks!

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240621_*/1/**/* | wc"; done
omeroreadonly-1
     29     116    3141
omeroreadonly-2
     65     260    7119
omeroreadonly-3
     22      88    2386
omeroreadonly-4
     23      92    2493
omeroreadwrite
     19      76    1998

Also cancelled, delete logs and restarted other 4 readonly servers too... (12:35)

will-moore commented 5 months ago

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240621_*/1/**/* | wc"; done
omeroreadonly-1
     33     132    3576
omeroreadonly-2
    123     492   13464
omeroreadonly-3
     26     104    2819
omeroreadonly-4
     23      92    2494
omeroreadwrite
     42     168    4429

15:39...

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240621_*/1/**/* | wc"; done
omeroreadonly-1
     53     212    5744.    (out of 148)
omeroreadonly-2
    127     508   13902.   (out of 129 - idr0011a - DONE, see below)
omeroreadonly-3
     26     104    2819.     (out of 68 -  idr0012)
omeroreadonly-4
     30     120    3252.       (out of 38)
omeroreadwrite
     46     184    4849.    (all DONE 46 - idr0004)

on omeroreadwrite, rename previous log dir so it doesn't show up in grep, start idr0125... - DONE then idr0033... (12 plates)...

for the 2 not ok images from idr0011a, these both had DatabaseBusy exceptions at 10:40

[wmoore@test122-omeroreadonly-2 ~]$ cat /tmp/ngff_cache_20240621_idr0011a/1/Image:2852482/stderr
[wmoore@test122-omeroreadonly-2 ~]$ cat /tmp/ngff_cache_20240621_idr0011a/1/Image:2852626/stderr

Both viewed OK in webclient - idr0011A DONE. Start idr0011b (40 plates)...

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240621_*/1/**/* | wc"; done
omeroreadonly-1
     64     256    6940      (out of 148 - idr0010)
omeroreadonly-2
     19      76    2080.      (out of 40 - idr0011b)
omeroreadonly-3
     26     104    2819.     (out of 68 - idr0012) - no progress - lots of DatabaseBusyExceptions - restarted 16:50...
omeroreadonly-4
     30     120    3252        (out of 83 - idr0015) - no progress
omeroreadwrite
     10      40    1083.      (out of 12 - idr0033)

will-moore commented 5 months ago

Overnight...

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240621_*/1/**/* | wc"; done
omeroreadonly-1
    148     592   16041.     (out of 148 - idr0010. - DONE) - start with idr0035
omeroreadonly-2
     40     160    4379.      (out of 40 - idr0011b - DONE) - start with idr0036
omeroreadonly-3
     55     220    5958.    (out of 68 - idr0012)
omeroreadonly-4
     57     228    6184.    (out of 83 - idr0015)
omeroreadwrite
     12      48    1301     (out of 12 - idr0033 - DONE) - start with idr0090

After few mins...

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240621_*/1/**/* | wc"; done
omeroreadonly-1
     55     220    5968   (out of 55 - idr0035. - DONE)
omeroreadonly-2
     17      68    1846.    (out of 20 - idr0036)
omeroreadonly-3
     55     220    5958.   (out of 68 - idr0012)
omeroreadonly-4
     57     228    6184.   (out of 83 - idr0015)
omeroreadwrite
     16      64    1779.    (out of 20 - idr0090)

will-moore commented 5 months ago

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240621_*/1/**/* | wc"; done
omeroreadonly-1
     55     220    5968.    (out of 55 - idr0035. - DONE)
omeroreadonly-2
     20      80    2169.     (out of 20 - idr0036 - DONE)
omeroreadonly-3
     55     220    5958.    (out of 68 - idr0012) -  TODO: investigate!!
omeroreadonly-4
     57     228    6184       (out of 83 - idr0015) -  TODO: investigate!!
omeroreadwrite
     20      80    2225.     (out of 20 - idr0090 - DONE) - restart ngff_datasets...

screen -dmS cache parallel --eta -a ngff_dataset_ids.txt --results /tmp/ngff_cache_20240623_datasets/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
...

[wmoore@test122-omeroreadwrite ~]$ grep "ok:" /tmp/ngff_cache_20240623_datasets/1/*/** | wc
   1580    6320  176656

Start ALL plates memo file generation...

[wmoore@test122-proxy ~]$ wc all_plate_ids.txt
 6762  6762 95866 all_plate_ids.txt

Start all fresh... in batches of 1500 rows with:

screen -dmS cache parallel --eta -a all_plate_ids.txt --results /tmp/ngff_cache_20240623_allplates/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

# remove first 1500 rows
sed '1,1500 d' all_plate_ids.txt > all_plate_ids_1500.txt

omeroreadonly-1 - all_plate_ids.txt
omeroreadonly-2 - all_plate_ids.txt (first 1500 rows removed)
omeroreadonly-3 - all_plate_ids.txt (first 3000 rows removed)
omeroreadonly-4 - all_plate_ids.txt (first 4500 rows removed)

will-moore commented 5 months ago

Cancel all memo generation (terminate screens) in prep for testing later this morning... NB: for omeroreadonly-3 and omeroreadonly-4, screens were still active for idr0012 and idr0015 above.

Checking idr0012 logs on omeroreadonly-3.. Find the 13 images missing from the logs with:

[wmoore@test122-omeroreadonly-3 ~]$ for i in $(cat idr0012_ids.txt); do echo $i && grep $i /tmp/ngff_cache_20240621_idr0012/1/*/**; done

Manually checked each in webclient. All viewed OK except for Image:1819431.

Checking idr0015 logs on omeroreadonly-4..

for i in $(cat idr0015_ids.txt); do echo $i && grep $i /tmp/ngff_cache_20240621_idr0015/1/*/**; done

Took the Image:IDs not found in logs and put them in a file to run render test...

[wmoore@test122-omeroreadonly-4 ~]$ vi idr0015_retest.txt
[wmoore@test122-omeroreadonly-4 ~]$ for i in $(cat idr0015_retest.txt); do echo $i && /opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force $i; done
Image:1971645
Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
ok: Pixels:1971645 Image:1971645 2.5951030254364014 
...

Stuck on Image:1957701... And the next image also lacks memo file. (update: http://localhost:1080/webclient/?show=image-1957701 is good now) Run the remaining via usual route...

[wmoore@test122-omeroreadonly-4 ~]$ screen -dmS cache parallel --eta -a idr0015_retest.txt --results /tmp/ngff_cache_20240624_idr0015 -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

Cancelled after ~ an hour (8:09) in prep for testing...

will-moore commented 5 months ago

Current status Summary: Memo files all complete for:

idr0004
idr00010
idr0011a, idr0011b
idr0012
idr0033
idr0035
idr0036
idr0064
idr0090

will-moore commented 5 months ago

Testing delayed until 1:30 today, so let's restart idr0015 last few on omeroreadonly-4...

[wmoore@test122-omeroreadonly-4 ~]$ screen -dmS cache parallel --eta -a idr0015_retest.txt --results /tmp/ngff_cache_20240624_idr0015 -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

Also run idr0011C and idr0011D on omeroreadonly-2 (4 plates, 8 plates).

screen -dmS cache parallel --eta -a idr0011c_ids.txt --results /tmp/ngff_cache_20240624_idr0011c -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

All completed with "ok" (and checked in webclient).

will-moore commented 5 months ago

Restarting with e.g.:

[wmoore@test122-omeroreadwrite ~]$ screen -dmS cache parallel --eta -a idr0013a_ids.txt --results /tmp/ngff_cache_20240624_idr0013a -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

omeroreadonly-1 idr0013b (28 plates)
omeroreadonly-2 idr0016 (413 plates)

omeroreadwrite idr0013a (510 plates)

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240624_*/1/**/* | wc"; done
omeroreadonly-1
 28     112    3067.        (idr0013B - DONE)
omeroreadonly-2
 36     144    4022.      (idr0016 - 413 plates)
omeroreadwrite
234     936   25623.     (idr0013a - 510 plates)

will-moore commented 5 months ago

idr-testing omeroreadonly-3.

Testing started at 13:30 (12:30 GMT on logs): Checking...

[wmoore@test122-omeroreadonly-4 ~]$ less /opt/omero/server/OMERO.server/var/log/Blitz-0.log.1

First ERROR after that (using grep)...

2024-06-24 12:32:12,119 ERROR [        ome.services.util.ServiceHandler] (.Server-32) Method interface ome.api.IQuery.findAllByQuery invocation took 28799

with no other errors,

Then....found a bunch of similar errors due to Database query times, starting at

2024-06-24 12:33:27,596 INFO  [        ome.services.util.ServiceHandler] (.Server-22)  Excp:    org.springframework.dao.DataAccessResourceFailureException: Hibernate operation: could not execute query; SQL [select image0_.id as id60_, image

2024-06-24 12:34:23,490 INFO  [        ome.services.util.ServiceHandler] (.Server-67)  Excp:    org.springframework.dao.DataAccessResourceFailureException: Hibernate operation: could not execute query; SQL [select image0_.id as id60_, image
0_.acquisitionDate as acquisit2_60_, image0_.archived as archived60_, image0_.description as descript4_60_, image0_.creation_id as creation10_60_, image0_.external_id as external11_60_, image0_.group_id as group12_60_, image0_.owner_id as o
wner13_60_, image0_.permissions as permissi5_60_, image0_.update_id as update14_60_, image0_.experiment as experiment60_, image0_.fileset as fileset60_, image0_.format as format60_, image0_.imagingEnvironment as imaging18_60_, image0_.instr
ument as instrument60_, image0_.name as name60_, image0_.objectiveSettings as objecti20_60_, image0_.partial as partial60_, image0_.series as series60_, image0_.stageLabel as stageLabel60_, image0_.version as version60_ from image image0_ l
eft outer join wellsample wellsample1_ on image0_.id=wellsample1_.image and 
( 
  1 = ? OR 
  1 = ? OR 
  (wellsample1_.group_id in (?)) OR 
  (wellsample1_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = wellsample1_.group_id)) OR 
  (wellsample1_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = wellsample1_.group_id)) OR 
  ((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = wellsample1_.group_id)) 
) inner join well well2_ on wellsample1_.well=well2_.id inner join plate plate3_ on well2_.plate=plate3_.id left outer join screenplatelink screenlink4_ on plate3_.id=screenlink4_.child and 
( 
  1 = ? OR 
  1 = ? OR 
  (screenlink4_.group_id in (?)) OR 
  (screenlink4_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = screenlink4_.group_id)) OR 
  (screenlink4_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = screenlink4_.group_id)) OR 
  ((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = screenlink4_.group_id)) 
) inner join screen screen5_ on screenlink4_.parent=screen5_.id left outer join imageannotationlink annotation6_ on image0_.id=annotation6_.parent and 
( 
  1 = ? OR 
  1 = ? OR 
  (annotation6_.group_id in (?)) OR 
  (annotation6_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = annotation6_.group_id)) OR 
  (annotation6_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = annotation6_.group_id)) OR 
  ((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = annotation6_.group_id)) 
) inner join annotation annotation7_ on annotation6_.child=annotation7_.id where 
( 
  1 = ? OR 
  1 = ? OR 
  (image0_.group_id in (?)) OR 
  (image0_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = image0_.group_id)) OR 
  (image0_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = image0_.group_id)) OR 
  ((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = image0_.group_id)) 
) and screen5_.id=? and annotation7_.textValue=? order by well2_."column", well2_."row"]; ERROR: canceling statement due to statement timeout; nested exception is org.postgresql.util.PSQLException: ERROR: canceling statement due to statemen
t timeout
2024-06-24 12:34:23,490 ERROR [        ome.services.util.ServiceHandler] (.Server-67) Method interface ome.api.IQuery.findAllByQuery invocation took 60052

2024-06-24 12:34:37,857 ERROR [        ome.services.util.ServiceHandler] (.Server-40) Method interface ome.api.IQuery.findAllByQuery invocation took 60046

2024-06-24 12:34:44,946 INFO  [                 org.perf4j.TimingLogger] (.Server-24) start[1719232424846] time[60100] tag[omero.call.exception]
2024-06-24 12:34:44,947 WARN  [        ome.services.util.ServiceHandler] (.Server-24) Unknown exception thrown.

org.springframework.dao.DataAccessResourceFailureException: Hibernate operation: could not execute query; SQL [select image0_.id as id60_, image0_.acquisitionDate as acquisit2_60_, image0_.archived as archived60_, image0_.description as des
cript4_60_, image0_.creation_id as creation10_60_, image0_.external_id as external11_60_, image0_.group_id as group12_60_, image0_.owner_id as owner13_60_, image0_.permissions as permissi5_60_, image0_.update_id as update14_60_, image0_.exp
eriment as experiment60_, image0_.fileset as fileset60_, image0_.format as format60_, image0_.imagingEnvironment as imaging18_60_, image0_.instrument as instrument60_, image0_.name as name60_, image0_.objectiveSettings as objecti20_60_, ima
ge0_.partial as partial60_, image0_.series as series60_, image0_.stageLabel as stageLabel60_, image0_.version as version60_ from image image0_ left outer join wellsample wellsample1_ on image0_.id=wellsample1_.image and 
( 
  1 = ? OR 
  1 = ? OR 
  (wellsample1_.group_id in (?)) OR 
  (wellsample1_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = wellsample1_.group_id)) OR 
  (wellsample1_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = wellsample1_.group_id)) OR 
  ((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = wellsample1_.group_id)) 
) inner join well well2_ on wellsample1_.well=well2_.id inner join plate plate3_ on well2_.plate=plate3_.id left outer join screenplatelink screenlink4_ on plate3_.id=screenlink4_.child and 
( 
  1 = ? OR 
  1 = ? OR 
  (screenlink4_.group_id in (?)) OR 
  (screenlink4_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = screenlink4_.group_id)) OR 
  (screenlink4_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = screenlink4_.group_id)) OR 
  ((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = screenlink4_.group_id)) 
) inner join screen screen5_ on screenlink4_.parent=screen5_.id left outer join imageannotationlink annotation6_ on image0_.id=annotation6_.parent and 
( 
  1 = ? OR 
  1 = ? OR 
  (annotation6_.group_id in (?)) OR 
  (annotation6_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = annotation6_.group_id)) OR 
  (annotation6_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = annotation6_.group_id)) OR 
  ((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = annotation6_.group_id)) 
) inner join annotation annotation7_ on annotation6_.child=annotation7_.id where 
( 
  1 = ? OR 
  1 = ? OR 
  (image0_.group_id in (?)) OR 
  (image0_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = image0_.group_id)) OR 
  (image0_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = image0_.group_id)) OR 
  ((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = image0_.group_id)) 
) and screen5_.id=? and annotation7_.textValue=? order by well2_."column", well2_."row"]; ERROR: canceling statement due to statement timeout; nested exception is org.postgresql.util.PSQLException: ERROR: canceling statement due to statemen
t timeout
        at org.springframework.jdbc.support.SQLStateSQLExceptionTranslator.doTranslate(SQLStateSQLExceptionTranslator.java:105)
        at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:73)
        at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:82)
        at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:82)
        at org.springframework.orm.hibernate3.HibernateAccessor.convertJdbcAccessException(HibernateAccessor.java:428)

2024-06-24 12:35:47,050 INFO  [                 org.perf4j.TimingLogger] (.Server-56) start[1719232486582] time[60468] tag[omero.call.exception]
2024-06-24 12:35:47,051 WARN  [        ome.services.util.ServiceHandler] (.Server-56) Unknown exception thrown.

org.springframework.dao.DataAccessResourceFailureException: Hibernate operation: could not execute query; SQL [select image0_.id as id60_, image0_.acquisitionDate as acquisit2_60_, image0_.archived as archived60_, image0_.description as des
cript4_60_, image0_.creation_id as creation10_60_, image0_.external_id as external11_60_, image0_.group_id as group12_60_, image0_.owner_id as owner13_60_, image0_.permissions as permissi5_60_, image0_.update_id as update14_60_, image0_.exp
eriment as experiment60_, image0_.fileset as fileset60_, image0_.format as format60_, image0_.imagingEnvironment as imaging18_60_, image0_.instrument as instrument60_, image0_.name as name60_, image0_.objectiveSettings as objecti20_60_, ima
ge0_.partial as partial60_, image0_.series as series60_, image0_.stageLabel as stageLabel60_, image0_.version as version60_ from image image0_ left outer join wellsample wellsample1_ on image0_.id=wellsample1_.image and 
( 
  1 = ? OR 
  1 = ? OR 
  (wellsample1_.group_id in (?)) OR 
  (wellsample1_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = wellsample1_.group_id)) OR 
  (wellsample1_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = wellsample1_.group_id)) OR 
  ((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = wellsample1_.group_id)) 
) inner join well well2_ on wellsample1_.well=well2_.id inner join plate plate3_ on well2_.plate=plate3_.id left outer join screenplatelink screenlink4_ on plate3_.id=screenlink4_.child and 
( 
  1 = ? OR 
  1 = ? OR 
  (screenlink4_.group_id in (?)) OR 
  (screenlink4_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = screenlink4_.group_id)) OR 
  (screenlink4_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = screenlink4_.group_id)) OR 
  ((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = screenlink4_.group_id)) 
) inner join screen screen5_ on screenlink4_.parent=screen5_.id left outer join imageannotationlink annotation6_ on image0_.id=annotation6_.parent and 
( 
  1 = ? OR 
  1 = ? OR 
  (annotation6_.group_id in (?)) OR 
  (annotation6_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = annotation6_.group_id)) OR 
  (annotation6_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = annotation6_.group_id)) OR 
  ((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = annotation6_.group_id)) 
) inner join annotation annotation7_ on annotation6_.child=annotation7_.id where 
( 
  1 = ? OR 
  1 = ? OR 
  (image0_.group_id in (?)) OR 
  (image0_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = image0_.group_id)) OR 
  (image0_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = image0_.group_id)) OR 
  ((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = image0_.group_id)) 
) and screen5_.id=? and annotation7_.textValue=? order by well2_."column", well2_."row"]; ERROR: canceling statement due to statement timeout; nested exception is org.postgresql.util.PSQLException: ERROR: canceling statement due to statemen
t timeout
        at org.springframework.jdbc.support.SQLStateSQLExceptionTranslator.doTranslate(SQLStateSQLExceptionTranslator.java:105)
        at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:73)
        at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:82)
        at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:82)
        at org.springframework.orm.hibernate3.HibernateAccessor.convertJdbcAccessException(HibernateAccessor.java:428)

2024-06-24 12:39:04,561 INFO  [                 org.perf4j.TimingLogger] (.Server-61) start[1719232684542] time[60019] tag[omero.call.exception]
2024-06-24 12:39:04,562 WARN  [        ome.services.util.ServiceHandler] (.Server-61) Unknown exception thrown.

org.springframework.dao.DataAccessResourceFailureException: Hibernate operation: could not execute query; SQL [select image0_.id as id60_, image0_.acquisitionDate as acquisit2_60_, image0_.archived as archived60_, image0_.description as des
cript4_60_, image0_.creation_id as creation10_60_, image0_.external_id as external11_60_, image0_.group_id as group12_60_, image0_.owner_id as owner13_60_, image0_.permissions as permissi5_60_, image0_.update_id as update14_60_, image0_.exp
eriment as experiment60_, image0_.fileset as fileset60_, image0_.format as format60_, image0_.imagingEnvironment as imaging18_60_, image0_.instrument as instrument60_, image0_.name as name60_, image0_.objectiveSettings as objecti20_60_, ima
ge0_.partial as partial60_, image0_.series as series60_, image0_.stageLabel as stageLabel60_, image0_.version as version60_ from image image0_ left outer join wellsample wellsample1_ on image0_.id=wellsample1_.image and 
( 
  1 = ? OR 
  1 = ? OR 
  (wellsample1_.group_id in (?)) OR 
  (wellsample1_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = wellsample1_.group_id)) OR 
  (wellsample1_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = wellsample1_.group_id)) OR 
  ((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = wellsample1_.group_id)) 
) inner join well well2_ on wellsample1_.well=well2_.id inner join plate plate3_ on well2_.plate=plate3_.id left outer join screenplatelink screenlink4_ on plate3_.id=screenlink4_.child and 
( 
  1 = ? OR 
  1 = ? OR 
  (screenlink4_.group_id in (?)) OR 
  (screenlink4_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = screenlink4_.group_id)) OR 
  (screenlink4_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = screenlink4_.group_id)) OR 
  ((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = screenlink4_.group_id)) 
) inner join screen screen5_ on screenlink4_.parent=screen5_.id left outer join imageannotationlink annotation6_ on image0_.id=annotation6_.parent and 
( 
  1 = ? OR 
  1 = ? OR 
  (annotation6_.group_id in (?)) OR 
  (annotation6_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = annotation6_.group_id)) OR 
  (annotation6_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = annotation6_.group_id)) OR 
  ((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = annotation6_.group_id)) 
) inner join annotation annotation7_ on annotation6_.child=annotation7_.id where 
( 
  1 = ? OR 
  1 = ? OR 
  (image0_.group_id in (?)) OR 
  (image0_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = image0_.group_id)) OR 
  (image0_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = image0_.group_id)) OR 
  ((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = image0_.group_id)) 
) and screen5_.id=? and annotation7_.textValue=? order by well2_."column", well2_."row"]; ERROR: canceling statement due to statement timeout; nested exception is org.postgresql.util.PSQLException: ERROR: canceling statement due to statement timeout
        at org.springframework.jdbc.support.SQLStateSQLExceptionTranslator.doTranslate(SQLStateSQLExceptionTranslator.java:105)
        at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:73)
        at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:82)
        at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:82)
        at org.springframework.orm.hibernate3.HibernateAccessor.convertJdbcAccessException(HibernateAccessor.java:428)

cc @khaledk2 @sbesson

will-moore commented 5 months ago

Restart ALL memo file generation (includes idr0013A and idr0016 not completed).. with e.g.

screen -dmS cache parallel --eta -a all_plate_ids.txt --results /tmp/ngff_cache_20240624_allplates/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

omeroreadonly-1 - all_plate_ids.txt
omeroreadonly-2 - all_plate_ids.txt (first 1500 rows removed)
omeroreadonly-3 - all_plate_ids.txt (first 3000 rows removed)
omeroreadonly-4 - all_plate_ids.txt (first 4500 rows removed)
omeroreadwrite - idr0013a (510 plates)

EDIT (2 days later) - I realise I should have used e.g. all_plate_ids_1500.txt rather than all_plate_ids.txt on readonly 2, 3 & 4 since the original all_plate_ids.txt wasn't modified. So, all servers ran the same ids!

will-moore commented 5 months ago

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240624_*/1/**/* | wc"; done
omeroreadonly-1
    454    1816   48862
omeroreadonly-2
    464    1856   50313
omeroreadonly-3
    426    1704   46100
omeroreadonly-4
Connection to idr-testing.openmicroscopy.org closed by remote host.

Later - still can't ssh to omeroreadonly-4

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240624_*/1/**/* | wc"; done
omeroreadonly-1
   1247    4988  134637
omeroreadonly-2
   1257    5028  136078
omeroreadonly-3
   1219    4876  131853
...
[wmoore@test122-omeroreadwrite ~]$ grep 'ok:' /tmp/ngff_cache_20240624_*/1/**/* | wc
    504    2016   55238

sbesson commented 5 months ago

Briefly discussed with @khaledk2 @dominikl @jburel @pwalczysko, to reduce the number of possible collisions during the (heavy) memo file regeneration process, one option would be to stop all services that might be concurrently access the database i.e. omero-web, prometheus-omero-exporter and the micro-service. Once the cache files have been regenerated, these services can be safely restarted

will-moore commented 5 months ago

Cancelled all memo generation. After restarting omeroreadonly-4 (master.err as 70 GB of prometheus errors)

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240624_*/1/**/* | wc"; done
omeroreadonly-1
   2624   10496  287101
omeroreadonly-2
   2634   10536  288491
omeroreadonly-3
   2596   10384  284295
omeroreadonly-4
   1171    4684  126720
omeroreadwrite
    504    2016   55238

[EDIT: all nodes ran the same ids.txt with 6762 plates - so, ~ 2624/6762 less than half complete]

All of the readonly servers have the same ResourceErrors:

/tmp/ngff_cache_20240624_allplates/1/Image:1949093/stdout:fail: Pixels:1949093 Image:1949093 0.7480490207672119 exception ::omero::ResourceError
/tmp/ngff_cache_20240624_allplates/1/Image:38517/stdout:fail: Pixels:38517 Image:38517 6.995946884155273 exception ::omero::ResourceError
/tmp/ngff_cache_20240624_allplates/1/Image:40053/stdout:fail: Pixels:40053 Image:40053 8.229637384414673 exception ::omero::ResourceError

Image:1949093 - idr0015 (known issue - no thumbnails)
Image:38517 & Image:40053 - 5th & 7th plates of idr0008 (58 plates) Viewing in webclient: message = Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2015-09/14/20-06-35.749/005B30_S2R.HTD

2024-06-25 10:13:03,452 ERROR [         ome.io.bioformats.BfPixelBuffer] (.Server-11) Failed to instantiate BfPixelsWrapper with /data/OMERO/ManagedRepository/demo_2/2015-09/14/20-06-35.749/005B30_S2R.HTD
2024-06-25 10:13:03,453 ERROR [                ome.io.nio.PixelsService] (.Server-11) Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2015-09/14/20-06-35.749/005B30_S2R.HTD
java.lang.RuntimeException: java.io.FileNotFoundException: /data/OMERO/ManagedRepository/demo_2/2015-09/14/20-06-35.749/005B30_S2R_P24_s2_w4.TIF (No such file or directory)
    at ome.io.bioformats.BfPixelBuffer.reader(BfPixelBuffer.java:79)
    at ome.io.bioformats.BfPixelBuffer.setSeries(BfPixelBuffer.java:124)

The file at /data/OMERO/ManagedRepository/demo_2/2015-09/14/20-06-35.749/005B30_S2R_P24_s2_w4.TIF is not found on idr-testing or idr-next or idr itself! Image 38517 gives ResouceError above on idr-testing and idr-next but not on idr.openmicroscopy.org!?

Actually, for those 2 plates (5th and 7th of idr008), there are NO ...w4.TIF images and the 4th channel (DAPI) is black (idr.openmicroscopy.org), whereas for all other idr0008 plates, there are 768 images like ...w4.TIF and DAPI is viewable. Images from these 2 plates are not viewable in idr-testing, possibly due to a change in Bio-Formats?

Seb: likely due to https://github.com/ome/bioformats/pull/3806

will-moore commented 5 months ago

idr0070 delete & reimport

Following steps at https://github.com/IDR/idr-metadata/issues/691#issuecomment-2023174883 to fix idr0070... in a screen as user omero-server. Removed annotations as described.

psql -U omero -d idr -h $DBHOST -c "select fileset from image where format=360 and series=1 and (fileset < 3552156)" >> idr0070_svs_filesetIds.txt

Removed header and footers (4 rows) then...

$ for r in $(cat idr0070_svs_filesetIds.txt); do echo $r | tr -d '[:space:]' >> idr0070_svs_filesetIds.csv && echo "" >> idr0070_svs_filesetIds.csv; done
$ wc idr0070_svs_filesetIds.csv 
 400  400 3200 idr0070_svs_filesetIds.csv

for r in $(cat idr0070_svs_filesetIds.csv); do omero delete --report Fileset:$r; done

Created idr0070-experimentA-filePaths_svs.tsv as described and import...

omero import --bulk experimentA/idr0070-experimentA-bulk.yml --file /tmp/idr0070svs.log  --errs /tmp/idr0070svs.err

Done: logs checked as described

$ grep "==> Summary" /tmp/idr0070svs.err | wc
    402     804    4824

Re-annotate as at https://github.com/IDR/idr-metadata/issues/691#issuecomment-2066854817

[wmoore@test122-omeroreadwrite idr0070-kerwin-hdbr]$ sudo -Es git checkout -f will/label_images_removal
Previous HEAD position was 452ae3b Fix typo
HEAD is now at 4b08e7f Fix annotations.csv for [label image] images

As omero-server

(venv3) bash-5.1$ /opt/omero/server/venv3/bin/python /uod/idr/metadata/idr-utils/scripts/annotate/check_annotations.py Project:1104 idr0070-experimentA-annotation.csv --output /tmp/errors.csv
All images are unique and have annotations.

$ omero delete --report Annotation:25135328
omero.cmd.Delete2 Annotation:25135328 ok
Steps: 6
Elapsed time: 0.515 secs.
Flags: []
Deleted objects
  FileAnnotation:25135328
  ProjectAnnotationLink:1667
  OriginalFile:29676048

$ /opt/omero/server/OMERO.server/bin/omero metadata populate --report --batch 1000 --file idr0070-experimentA-annotation.csv Project:1104

$ /opt/omero/server/OMERO.server/bin/omero metadata populate --context bulkmap --batch 100 --cfg idr0070-experimentA-bulkmap-config.yml Project:1104

Needed to cache-bust with ssh -A idr-testing.openmicroscopy.org -L 9000:localhost:9000 and open all Datasets. Looking good now - new images view OK and have annotations, e.g. https://idr-testing.openmicroscopy.org/webclient/?show=dataset-10461

will-moore commented 5 months ago

To check that rendering via s3 is working (not relying on goofys mount), tested on omeroreadwrite, unmounting and check that we can view image... Need to stop server to release mount...

sudo service omero-server stop
sudo umount /bia-integrator-data
sudo service omero-server start

Able to view NGFF images (from idr0004) 👍 Re-mounted:

$ which goofys
/usr/bin/goofys
$ sudo goofys --endpoint https://uk1s3.embassy.ebi.ac.uk/ -o allow_other bia-integrator-data /bia-integrator-data

will-moore commented 5 months ago

Checking available disk space on all servers ahead of today's testing... No red flags:

for n in $(cat nodes);   do echo $n && ssh $n "df -h"; done
omeroreadonly-1
Filesystem                          Size  Used Avail Use% Mounted on
devtmpfs                            4.0M     0  4.0M   0% /dev
tmpfs                                16G     0   16G   0% /dev/shm
tmpfs                               6.3G  772K  6.3G   1% /run
/dev/vda5                            79G  9.3G   70G  12% /
/dev/vda2                           936M  283M  654M  31% /boot
/dev/vda1                            99M  7.1M   92M   8% /boot/efi
10.35.199.251:/fg_bioimage          572T  410T  162T  72% /nfs/bioimage
bia-idr-integration                 1.0P     0  1.0P   0% /bia-idr
bia-integrator-data                 1.0P     0  1.0P   0% /bia-integrator-data
192.168.2.43:/data/OMERO            1.5T  1.2T  298G  81% /data/OMERO-readonly
192.168.2.43:/data/BioFormatsCache  1.5T  1.2T  298G  81% /data/BioFormatsCache
192.168.2.43:/data/idr-metadata     1.5T  1.2T  298G  81% /data/idr-metadata
tmpfs                               3.2G  4.0K  3.2G   1% /run/user/546
tmpfs                               3.2G  4.0K  3.2G   1% /run/user/5098
...

will-moore commented 5 months ago

idr0009 bfoptions update

Followed instructions at https://github.com/IDR/idr-metadata/issues/684#issuecomment-2076629628 Tested in webclient (triggering memo file generation) and could view last Images from e.g. http://localhost:1080/webclient/?show=plate-2703

will-moore commented 5 months ago

Following successful testing of completed NGFF studies this morning, let's resume memo file generation for ALL data... Still outstanding:

omeroreadonly-1 - all_dataset_ids.txt
omeroreadonly-2 - all_plate_ids_1500.txt (first 1500 rows removed)
omeroreadonly-3 - all_plate_ids_3000.txt (first 3000 rows removed)
omeroreadonly-4 - all_plate_ids_4500.txt (first 4500 rows removed)
omeroreadwrite - all_dataset_ids_7000.txt (first 7000 rows removed)

Running (stopping Prometheus first) e.g.

sudo service prometheus-omero-exporter stop
screen -dmS cache parallel --eta -a all_plate_ids_1500.txt --results /tmp/ngff_cache_20240626_allplates/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

Datasets... Split all_dataset_ids.txt (14910 rows) into 2...

$ wc all_dataset_ids.txt 
 14909  14909 199954 all_dataset_ids.txt
# remove first 7000 rows
$ sed '1,7000 d' all_dataset_ids.txt > all_dataset_ids_7000.txt
$ wc all_dataset_ids_7000.txt
  7909   7909 109641 all_dataset_ids_7000.txt

screen -dmS cache parallel --eta -a all_dataset_ids_7000.txt --results /tmp/ngff_cache_20240626_alldatasets/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

for n in $(cat nodes);   do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240626_*/1/**/* | wc"; done
omeroreadonly-1
      0       0       0
omeroreadonly-2
     35     140    3854
omeroreadonly-3
    561    2244   62019
omeroreadonly-4
    503    2012   57033
omeroreadwrite
    630    2520   70872

After about 1 hour...

$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240626_*/1/**/* | wc"; done
omeroreadonly-1
   1042    4168  114296
omeroreadonly-2
     59     236    6501.   (out of 1500+ plates)
omeroreadonly-3
    596    2384   65882.  (out of 1500+ plates)
omeroreadonly-4
   1462    5848  165797.   (out of 1500+ plates)
omeroreadwrite
  47650  190600 5352329

Check directories (useful for counting Dataset:ID dirs for)

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "ls /tmp/ngff_cache_20240626_*/1/ | wc"; done
omeroreadonly-1
    268     268    3167
omeroreadonly-2
    111     111    1554
omeroreadonly-3
    653     653    9142
omeroreadonly-4
   1483    1483   22245
omeroreadwrite
    468     468    6085

will-moore commented 5 months ago

idr0043 ResouceErrors:

$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ResourceError' /tmp/ngff_cache_20240626_*/1/**/* | wc"; done
omeroreadonly-1
      0       0       0
omeroreadonly-2
      0       0       0
omeroreadonly-3
      0       0       0
omeroreadonly-4
      0       0       0
omeroreadwrite
   1436    8616  210222

Affecting Datasets from e.g 9101 -> 14734 - All seem to be idr0043: E.g. viewing http://localhost:1080/webclient/img_detail/13004862/ Gives:

Stack trace

``` 2024-06-26 13:20:46,228 DEBUG [ loci.formats.Memoizer] (.Server-15) start[1719408046201] time[26] tag[loci.formats.Memoizer.setId] 2024-06-26 13:20:46,228 ERROR [ ome.io.bioformats.BfPixelBuffer] (.Server-15) Failed to instantiate BfPixelsWrapper with /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-13/2021-06/05/12-54-52.237/141893_A_1_1.tif 2024-06-26 13:20:46,228 ERROR [ ome.io.nio.PixelsService] (.Server-15) Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-13/2021-06/05/12-54-52.237/141893_A_1_1.tif java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0 at ome.io.bioformats.BfPixelBuffer.reader(BfPixelBuffer.java:79) at ome.io.bioformats.BfPixelBuffer.setSeries(BfPixelBuffer.java:124) at ome.io.nio.PixelsService.createBfPixelBuffer(PixelsService.java:898) at ome.io.nio.PixelsService._getPixelBuffer(PixelsService.java:653) at ome.io.nio.PixelsService.getPixelBuffer(PixelsService.java:571) at ome.services.RenderingBean$12.doWork(RenderingBean.java:2205) at jdk.internal.reflect.GeneratedMethodAccessor319.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) at ome.services.util.Executor$Impl$Interceptor.invoke(Executor.java:568) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) at ome.security.basic.EventHandler.invoke(EventHandler.java:154) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) at org.springframework.orm.hibernate3.HibernateInterceptor.invoke(HibernateInterceptor.java:119) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:99) at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:282) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:96) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) at ome.tools.hibernate.ProxyCleanupFilter$Interceptor.invoke(ProxyCleanupFilter.java:249) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) at ome.services.util.ServiceHandler.invoke(ServiceHandler.java:121) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213) at com.sun.proxy.$Proxy101.doWork(Unknown Source) at ome.services.util.Executor$Impl.execute(Executor.java:447) at ome.services.util.Executor$Impl.execute(Executor.java:392) at ome.services.RenderingBean.getPixelBuffer(RenderingBean.java:2202) at ome.services.RenderingBean.load(RenderingBean.java:417) at jdk.internal.reflect.GeneratedMethodAccessor1342.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) at ome.services.util.ServiceHandler.invoke(ServiceHandler.java:121) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213) at com.sun.proxy.$Proxy122.load(Unknown Source) at jdk.internal.reflect.GeneratedMethodAccessor1342.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) at ome.security.basic.BasicSecurityWiring.invoke(BasicSecurityWiring.java:93) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) at ome.services.blitz.fire.AopContextInitializer.invoke(AopContextInitializer.java:43) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213) at com.sun.proxy.$Proxy122.load(Unknown Source) at jdk.internal.reflect.GeneratedMethodAccessor1417.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at ome.services.blitz.util.IceMethodInvoker.invoke(IceMethodInvoker.java:172) at ome.services.throttling.Callback.run(Callback.java:56) at ome.services.throttling.InThreadThrottlingStrategy.callInvokerOnRawArgs(InThreadThrottlingStrategy.java:56) at ome.services.blitz.impl.AbstractAmdServant.callInvokerOnRawArgs(AbstractAmdServant.java:140) at ome.services.blitz.impl.RenderingEngineI.load_async(RenderingEngineI.java:316) at jdk.internal.reflect.GeneratedMethodAccessor1416.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) at omero.cmd.CallContext.invoke(CallContext.java:85) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213) at com.sun.proxy.$Proxy124.load_async(Unknown Source) at omero.api._RenderingEngineTie.load_async(_RenderingEngineTie.java:248) at omero.api._RenderingEngineDisp.___load(_RenderingEngineDisp.java:1223) at omero.api._RenderingEngineDisp.__dispatch(_RenderingEngineDisp.java:2405) at IceInternal.Incoming.invoke(Incoming.java:221) at Ice.ConnectionI.invokeAll(ConnectionI.java:2536) at Ice.ConnectionI.dispatch(ConnectionI.java:1145) at Ice.ConnectionI.message(ConnectionI.java:1056) at IceInternal.ThreadPool.run(ThreadPool.java:395) at IceInternal.ThreadPool.access$300(ThreadPool.java:12) at IceInternal.ThreadPool$EventHandlerThread.run(ThreadPool.java:832) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0 at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64) at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70) at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248) at java.base/java.util.Objects.checkIndex(Objects.java:374) at java.base/java.util.ArrayList.get(ArrayList.java:459) at loci.formats.MetadataList.get(MetadataList.java:121) at loci.formats.SubResolutionFormatReader.getCurrentCore(SubResolutionFormatReader.java:238) at loci.formats.FormatReader.getPixelType(FormatReader.java:735) at loci.formats.MetadataTools.populatePixels(MetadataTools.java:149) at loci.formats.MetadataTools.populatePixels(MetadataTools.java:116) at loci.formats.in.BaseTiffReader.initMetadataStore(BaseTiffReader.java:426) at loci.formats.in.SVSReader.initMetadataStore(SVSReader.java:669) at loci.formats.in.BaseTiffReader.initMetadata(BaseTiffReader.java:99) at loci.formats.in.BaseTiffReader.initFile(BaseTiffReader.java:610) at loci.formats.FormatReader.setId(FormatReader.java:1480) at loci.formats.ImageReader.setId(ImageReader.java:865) at ome.io.nio.PixelsService$3.setId(PixelsService.java:869) at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:692) at loci.formats.ChannelFiller.setId(ChannelFiller.java:258) at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:692) at loci.formats.ChannelSeparator.setId(ChannelSeparator.java:317) at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:692) at loci.formats.Memoizer.setId(Memoizer.java:726) at ome.io.bioformats.BfPixelsWrapper.(BfPixelsWrapper.java:52) at ome.io.bioformats.BfPixelBuffer.reader(BfPixelBuffer.java:73) ... 82 common frames omitted 2024-06-26 13:20:46,229 INFO [ org.perf4j.TimingLogger] (.Server-15) start[1719408046193] time[35] tag[omero.call.exception] 2024-06-26 13:20:46,229 INFO [ ome.services.util.ServiceHandler] (.Server-15) Excp: ome.conditions.ResourceError: Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-13/2021-06/05/12-54-52.237/141893_A_1_1.tif 2024-06-26 13:20:46,229 INFO [ org.perf4j.TimingLogger] (.Server-15) start[1719408046181] time[47] tag[omero.call.exception] ```

will-moore commented 5 months ago

Current status: ok:

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240626_*/1/**/* | wc -l"; done
omeroreadonly-1
2141
omeroreadonly-2
192
omeroreadonly-3
754
omeroreadonly-4
1513
omeroreadwrite
397227

ResourceError

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ResourceError' /tmp/ngff_cache_20240626_*/1/**/* | wc -l"; done
omeroreadonly-1
0
omeroreadonly-2
0
omeroreadonly-3
0
omeroreadonly-4
0
omeroreadwrite
8091

3 hours later:

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240626_*/1/**/* | wc -l"; done
omeroreadonly-1
2735   (Images out of 7k Datasets)
omeroreadonly-2
1107.  (out of 1500+ plates)
omeroreadonly-3
901.  (out of 1500+ plates)
omeroreadonly-4
1516.  (out of 1500+ plates)
omeroreadwrite
628416. (Images out of 8k Datasets) NB: idr0043 is 12k Datasets

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ResourceError' /tmp/ngff_cache_20240626_*/1/**/* | wc -l"; done
omeroreadonly-1
0
omeroreadonly-2
1    (known - idr0015 plate)
omeroreadonly-3
0
omeroreadonly-4
0
omeroreadwrite
12436

will-moore commented 5 months ago

6:25 am

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240626_*/1/**/* | wc -l"; done
omeroreadonly-1
26409
omeroreadonly-2
1311.     (out of 1500+ plates)
omeroreadonly-3
1359.    (out of 1500+ plates)
omeroreadonly-4
1600.    (out of 1500+ plates)
omeroreadwrite
1296666

10:32 am

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240626_*/1/**/* | wc -l"; done
omeroreadonly-1
28774
omeroreadonly-2
1400
omeroreadonly-3
1359
omeroreadonly-4
1629    (out of 2262 rows in all_plate_ids_4500.txt)
omeroreadwrite
1658777

No progress on omeroreadonly-3 in the last few hours. Checking logs, last 10 directories are empty..

[wmoore@test122-omeroreadonly-3 ~]$ ls -alh /tmp/ngff_cache_20240626_allplates/1/
...
drwxr-xr-x.    2 wmoore wmoore  45 Jun 27 03:36 Image:9020764
drwxr-xr-x.    2 wmoore wmoore  45 Jun 27 03:36 Image:9529001
drwxr-xr-x.    2 wmoore wmoore  45 Jun 27 03:36 Image:9538241
... these empty:
drwxr-xr-x.    2 wmoore wmoore  45 Jun 27 03:36 Image:9544982
drwxr-xr-x.    2 wmoore wmoore  45 Jun 27 03:37 Image:9545590
drwxr-xr-x.    2 wmoore wmoore  45 Jun 27 03:42 Image:9565961
drwxr-xr-x.    2 wmoore wmoore  45 Jun 27 03:42 Image:9568592
drwxr-xr-x.    2 wmoore wmoore  45 Jun 27 03:42 Image:9621401
drwxr-xr-x.    2 wmoore wmoore  45 Jun 27 03:42 Image:9632921
drwxr-xr-x.    2 wmoore wmoore  45 Jun 27 03:42 Image:9634488
drwxr-xr-x.    2 wmoore wmoore  45 Jun 27 03:51 Image:9636808
drwxr-xr-x.    2 wmoore wmoore  45 Jun 27 04:02 Image:9667170
drwxr-xr-x.    2 wmoore wmoore  45 Jun 27 04:02 Image:9669446

Last Image:9538241 to be processed took half an hour (1542 secs) to process:

[wmoore@test122-omeroreadonly-3 ~]$ cat /tmp/ngff_cache_20240626_allplates/1/Image\:9538241/stdout 
ok: Pixels:9538241 Image:9538241 1542.7414109706879

Checking logs for one of those 10 images: https://idr.openmicroscopy.org/webclient/?show=image-9544982 finds only:

[wmoore@test122-omeroreadonly-3 ~]$ grep "2020-01/24/11-38-52.433/007015009.flex" /opt/omero/server/OMERO.server/var/log/Blitz-0.log
2024-06-27 03:55:19,390 WARN  [              loci.formats.FormatHandler] (.Server-27) parsing /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-13/2020-01/24/11-38-52.433/007015009.flex

Blitz log is still active (server not dead)!

Checking logs against input ids... e.g.

[wmoore@test122-omeroreadonly-2 ~]$ for i in $(cat all_plate_ids_1500.txt); do echo $i && grep $i /tmp/ngff_cache_20240626_allplates/1/**/*; done;
Image:1376659
/tmp/ngff_cache_20240626_allplates/1/Image:1376659/stdout:ok: Pixels:1376659 Image:1376659 92.37646412849426 
Image:1377043
/tmp/ngff_cache_20240626_allplates/1/Image:1377043/stdout:ok: Pixels:1377043 Image:1377043 68.95386505126953 
Image:1377427
/tmp/ngff_cache_20240626_allplates/1/Image:1377427/stdout:ok: Pixels:1377427 Image:1377427 68.418936252594 
Image:1377811
/tmp/ngff_cache_20240626_allplates/1/Image:1377811/stdout:ok: Pixels:1377811 Image:1377811 81.83312964439392 
...

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240626_*/1/**/* | wc -l"; done
omeroreadonly-1
29777
omeroreadonly-2
1609         Done
omeroreadonly-3
1359      Still paused!
omeroreadonly-4
1636.     (out of 2262 rows in all_plate_ids_4500.txt)
omeroreadwrite
1751341

Directory counts:

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "ls /tmp/ngff_cache_20240626_*/1/ | wc -l"; done
omeroreadonly-1
529.  (out of 7k Datasets!)
omeroreadonly-2
2165
omeroreadonly-3
1369
omeroreadonly-4
1658
omeroreadwrite
4522. (out of 7.9k Datasets!)

will-moore commented 5 months ago

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240626_*/1/**/* | wc -l"; done
omeroreadonly-1
30788
omeroreadonly-2
2857
omeroreadonly-3
1359.  (all_plate_ids_3000.txt)
omeroreadonly-4
1667.  (out of 2262 rows in all_plate_ids_4500.txt)
omeroreadwrite
1912051

omeroreadonly-4 has done 1667/2262 rows (end of all_plate_ids_4500.txt), so let's use omeroreadonly-2 to process the last 262 rows. Remove 6500 rows from all_plate_ids.txt

$ sed '1,6500 d' all_plate_ids.txt > all_plate_ids_6500.txt
$ wc all_plate_ids_6500.txt
 262  262 3930 all_plate_ids_6500.txt

[wmoore@test122-omeroreadonly-2 ~]$ screen -dmS cache parallel --eta -a all_plate_ids_6500.txt --results /tmp/ngff_cache_20240627_allplates/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

omeroreadonly-3 has been stuck at the same point for many hours now. No errors. Going to stop, restart server and restart, removing first 1300 rows from all_plate_ids_3000.txt

[wmoore@test122-omeroreadonly-3 ~]$ sudo service omero-server restart
[wmoore@test122-omeroreadonly-3 ~]$ sed '1,1300 d' all_plate_ids_3000.txt > all_plate_ids_4300.txt
[wmoore@test122-omeroreadonly-3 ~]$ screen -dmS cache parallel --eta -a all_plate_ids_4300.txt --results /tmp/ngff_cache_20240627_allplates/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

will-moore commented 5 months ago

Looks like omeroreadonly-3 got stuck at the same point as before... (after 1359 ids before, now starting with 1300 ids removed we got stuck at 59)!!:

[wmoore@test122-omeroreadonly-3 ~]$ grep 'ok:' /tmp/ngff_cache_20240627_allplates/1/**/* | wc -l
59

Took the next 10 Image IDs from above:

drwxr-xr-x.    2 wmoore wmoore  45 Jun 27 03:36 Image:9544982
drwxr-xr-x.    2 wmoore wmoore  45 Jun 27 03:37 Image:9545590
drwxr-xr-x.    2 wmoore wmoore  45 Jun 27 03:42 Image:9565961
drwxr-xr-x.    2 wmoore wmoore  45 Jun 27 03:42 Image:9568592
drwxr-xr-x.    2 wmoore wmoore  45 Jun 27 03:42 Image:9621401
drwxr-xr-x.    2 wmoore wmoore  45 Jun 27 03:42 Image:9632921
drwxr-xr-x.    2 wmoore wmoore  45 Jun 27 03:42 Image:9634488
drwxr-xr-x.    2 wmoore wmoore  45 Jun 27 03:51 Image:9636808
drwxr-xr-x.    2 wmoore wmoore  45 Jun 27 04:02 Image:9667170
drwxr-xr-x.    2 wmoore wmoore  45 Jun 27 04:02 Image:9669446

And viewed them all in webclient (connected to omeroreadwrite), e.g. - took about a minute for each to render (memo) http://localhost:1080/webclient/?show=image-9669446

Now let's run same again on omeroreadonly-3 and see if we get further...

 $ mv /tmp/ngff_cache_20240627_allplates/ /tmp/ngff_cache_stuck_20240627_allplates/
[wmoore@test122-omeroreadonly-3 ~]$ screen -dmS cache parallel --eta -a all_plate_ids_4300.txt --results /tmp/ngff_cache_20240627_allplates/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

[wmoore@test122-omeroreadonly-3 ~]$ grep 'ok:' /tmp/ngff_cache_20240627_allplates/1/**/* | wc -l
59
[wmoore@test122-omeroreadonly-3 ~]$ grep 'ok:' /tmp/ngff_cache_20240627_allplates/1/**/* | wc -l
79

Seems stuck again.... but not quite

will-moore commented 5 months ago

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_2024062*/1/**/* | wc -l"; done
omeroreadonly-1
36688
omeroreadonly-2
18.    (out of 262 ids in all_plate_ids_6500.txt)
omeroreadonly-3
82.   (out of 200 ids in all_plate_ids_4300.txt)
omeroreadonly-4
1704.  (out of 2000 ids: 4500->6500)
omeroreadwrite
2104834

Nearly 2 hours later...

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_2024062*/1/**/* | wc -l"; done
omeroreadonly-1
40003    - now stopped - switch to last 100 plates of all_plate_ids_6500.txt 
omeroreadonly-2
20
omeroreadonly-3
88
omeroreadonly-4
1710
omeroreadwrite
2193689

Only a few more plates processed. We need ~100 - ~300 more processed on each of omeroreadonly-2, omeroreadonly-3, omeroreadonly-4. Not going to be done in time for tomorrow morning (12 hours).

Need to stop processing of Datasets on omeroreadonly-1 and omeroreadwrite since these are less critical (Images are still viewable etc without memo files) and switch to Plates on these...

Last 100 rows

[wmoore@test122-omeroreadonly-1 ~]$ wc all_plate_ids.txt
 6762  6762 95866 all_plate_ids.txt
[wmoore@test122-omeroreadonly-1 ~]$ sed '1,6662 d' all_plate_ids.txt > all_plate_ids_6762.txt
[wmoore@test122-omeroreadonly-1 ~]$ wc -l all_plate_ids_6762.txt 
100 all_plate_ids_6762.txt
[wmoore@test122-omeroreadonly-1 ~]$ screen -dmS cache parallel --eta -a all_plate_ids_6762.txt --results /tmp/ngff_cache_20240627_allplates/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

rows 6350...

$ wc all_plate_ids.txt
 6762  6762 95866 all_plate_ids.txt
$ sed '1,6350 d' all_plate_ids.txt > all_plate_ids_6350.txt

[wmoore@test122-omeroreadwrite ~]$ screen -dmS cache parallel --eta -a all_plate_ids_6350.txt --results /tmp/ngff_cache_20240627_allplates/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

will-moore commented 5 months ago

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_2024062*/1/**/* | wc -l"; done
omeroreadonly-1
78.      (out of 100 all_plate_ids_6762.txt)
omeroreadonly-2
194.     (out of 262 ids in all_plate_ids_6500.txt)
omeroreadonly-3
96.        (all_plate_ids_4300.txt)
omeroreadonly-4
1716      (all_plate_ids_4500.txt) 
omeroreadwrite
412.    (Done! all_plate_ids_6350.txt) - faster than others! - restart... all_plate_ids_4000.txt

Restart

[wmoore@test122-omeroreadwrite ~]$ screen -dmS cache parallel --eta -a all_plate_ids_4000.txt --results /tmp/ngff_cache_20240627_plates_4000/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

will-moore commented 5 months ago

Overnight: looking good... All screen processes were complete except for omeroreadonly-3. Cancelled it ahead of testing...

[wmoore@test122-proxy ~]$ for n in $(cat nodes);   do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_2024062*/1/**/* | wc -l"; done
omeroreadonly-1
98
omeroreadonly-2
262
omeroreadonly-3
2455
omeroreadonly-4
2261
omeroreadwrite
3174

will-moore commented 5 months ago

Yesterday, started deleting the contents of

cd /data/OMERO/BioFormatsCache/data_to_delete

created above (old memo files). Ran in a screen, deleting parts of data_to_delete/OMERO/ManagedRepository/demo_2 at a time, e.g.

[wmoore@test122-omeroreadwrite demo_2]$ sudo rm -rf Blitz-0-Ice.ThreadPool.Server-2*

will-moore commented 4 months ago

Testing today across ALL data in idr-testing...

Eventually we ran into slow-down, and finally Bad Gateway in web, with errors on omeroreadonly-4 Blitz log:

2024-06-28 10:56:04,769 WARN  [  o.services.db.SelfCorrectingDataSource] (.Server-47) Registering error with list: Current size: 82
2024-06-28 10:56:04,769 ERROR [  o.services.db.SelfCorrectingDataSource] (.Server-47) Failed to acquire connection after retries=3
java.sql.SQLException: unable to get a connection from pool of a PoolingDataSource containing an XAPool of resource c094dc5d-80a5-41c5-9a07-8f9ec64a091a with 25 connection(s) (0 still available)
    at bitronix.tm.resource.jdbc.PoolingDataSource.getConnection(PoolingDataSource.java:332)
    at org.springframework.jdbc.datasource.DelegatingDataSource.getConnection(DelegatingDataSource.java:86)
    at ome.services.db.SelfCorrectingDataSource.call(SelfCorrectingDataSource.java:124)
org.springframework.jdbc.datasource.DataSourceTransactionManager.doBegin(DataSourceTransactionManager.java:246)

Initially, some images took a long time to load, possibly due to lack of memo files (didn't complete for Dataset Images): e.g. https://idr-testing.openmicroscopy.org/webclient/img_detail/12557109/?dataset=13701 and also idr0009 which should have memo files...

Seeing lots of these blocks repeating just now (omeroreadonly-4):

2024-06-28 11:14:48,626 INFO  [                loci.formats.ImageReader] (.Server-41) TiffDelegateReader initializing /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-8/2021-02/23/17-10-04.263/TP_T309_Chgreen_Ill0_Ang0,1,2.tif
2024-06-28 11:14:48,626 INFO  [                loci.formats.ImageReader] (.Server-40) TiffDelegateReader initializing /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-8/2021-02/23/17-10-04.263/TP_T309_Chgreen_Ill0_Ang0,1,2.tif
2024-06-28 11:14:48,627 INFO  [       loci.formats.in.MinimalTiffReader] (.Server-41) Reading IFDs
2024-06-28 11:14:48,627 INFO  [       loci.formats.in.MinimalTiffReader] (.Server-40) Reading IFDs
2024-06-28 11:14:48,628 INFO  [       loci.formats.in.MinimalTiffReader] (.Server-41) Populating metadata
2024-06-28 11:14:48,629 INFO  [       loci.formats.in.MinimalTiffReader] (.Server-40) Populating metadata
2024-06-28 11:14:48,629 INFO  [              loci.formats.in.TiffReader] (.Server-41) Checking comment style
2024-06-28 11:14:48,630 INFO  [              loci.formats.in.TiffReader] (.Server-40) Checking comment style
2024-06-28 11:14:48,634 INFO  [          loci.formats.in.BaseTiffReader] (.Server-41) Populating OME metadata
2024-06-28 11:14:48,637 INFO  [          loci.formats.in.BaseTiffReader] (.Server-40) Populating OME metadata

Looking at memo files created during testing window - > 9:30 BST is > 8:30 GMT found 56 entries in omeroreadonly-4:

[wmoore@test122-omeroreadonly-4 ~]$ grep "saved memo" /opt/omero/server/OMERO.server/var/log/Blitz-0.log
...
2024-06-28 08:56:17,251 DEBUG [                   loci.formats.Memoizer] (l.Server-8) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2015-09/23/13-06-15.559/.experiment_descriptor.xml.bfmemo (452167 bytes)
2024-06-28 08:56:39,546 DEBUG [                   loci.formats.Memoizer] (l.Server-5) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2015-09/23/12-26-19.536/.experiment_descriptor.xml.bfmemo (531206 bytes)
2024-06-28 08:57:10,473 DEBUG [                   loci.formats.Memoizer] (.Server-12) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2015-12/08/09-05-47.246/.experiment_descriptor.xml.bfmemo (451823 bytes)
2024-06-28 08:57:10,473 DEBUG [                   loci.formats.Memoizer] (l.Server-4) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2015-12/08/09-05-47.246/.experiment_descriptor.xml.bfmemo (451823 bytes)
2024-06-28 08:58:28,668 DEBUG [                   loci.formats.Memoizer] (.Server-20) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2015-09/23/11-38-36.949/.experiment_descriptor.xml.bfmemo (452167 bytes)
2024-06-28 08:59:09,932 DEBUG [                   loci.formats.Memoizer] (.Server-12) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2015-09/23/11-14-58.128/.experiment_descriptor.xml.bfmemo (452167 bytes)
2024-06-28 08:59:23,801 DEBUG [                   loci.formats.Memoizer] (.Server-11) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2015-11/26/07-35-04.628/.experiment_descriptor.xml.bfmemo (451613 bytes)
2024-06-28 08:59:23,981 DEBUG [                   loci.formats.Memoizer] (l.Server-3) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2015-11/26/07-35-04.628/.experiment_descriptor.xml.bfmemo (451613 bytes)
2024-06-28 09:05:05,583 DEBUG [                   loci.formats.Memoizer] (.Server-10) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-04/05/20-45-58.507/.img_000000001_07-YFPtriple_000.tif.bfmemo (1288707 bytes)
2024-06-28 09:05:57,129 DEBUG [                   loci.formats.Memoizer] (.Server-13) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-04/05/10-26-47.150/.img_000000001_07-YFPtriple_000.tif.bfmemo (1286993 bytes)
2024-06-28 09:06:22,096 DEBUG [                   loci.formats.Memoizer] (.Server-19) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-04/05/10-47-47.606/.img_000000001_07-YFPtriple_000.tif.bfmemo (1287203 bytes)
2024-06-28 09:06:34,361 DEBUG [                   loci.formats.Memoizer] (l.Server-4) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-04/05/10-24-27.224/.img_000000001_07-YFPtriple_000.tif.bfmemo (1288135 bytes)
2024-06-28 09:06:59,275 DEBUG [                   loci.formats.Memoizer] (.Server-11) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-04/05/10-25-15.670/.img_000000001_07-YFPtriple_000.tif.bfmemo (1286930 bytes)
2024-06-28 09:06:59,843 DEBUG [                   loci.formats.Memoizer] (.Server-10) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-04/05/10-32-48.388/.img_000000001_07-YFPtriple_000.tif.bfmemo (1287339 bytes)
2024-06-28 09:07:02,187 DEBUG [                   loci.formats.Memoizer] (l.Server-8) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-04/05/10-32-03.591/.img_000000001_07-YFPtriple_000.tif.bfmemo (1287303 bytes)
2024-06-28 09:07:08,730 DEBUG [                   loci.formats.Memoizer] (l.Server-6) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-04/05/10-33-33.721/.img_000000001_07-YFPtriple_000.tif.bfmemo (1287270 bytes)
2024-06-28 09:08:12,530 DEBUG [                   loci.formats.Memoizer] (.Server-12) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-04/05/10-24-27.224/.img_000000001_07-YFPtriple_000.tif.bfmemo (1288135 bytes)
2024-06-28 09:09:06,474 DEBUG [                   loci.formats.Memoizer] (.Server-10) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-04/04/17-51-26.642/.TR1_2_W0001_P0009_T0015.tif.bfmemo (53568 bytes)
2024-06-28 09:12:53,373 DEBUG [                   loci.formats.Memoizer] (.Server-11) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-19/2018-11/26/09-57-43.807/Mmu_E1_CAGTAG1.TM000036_timeFused_blending/.SPM00_TM000036_CM00_CM01_CHN01.fusedStack_yzProjection.corrected.shifted.klb.bfmemo (945972 bytes)
2024-06-28 09:13:32,400 DEBUG [                   loci.formats.Memoizer] (.Server-16) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-20/2018-11/26/10-02-40.771/Mmu_E1_CAGTAG1.TM000000_timeFused_blending/.SPM00_TM000000_CM00_CM01_CHN00.fusedStack_xyProjection.corrected.shifted.klb.bfmemo (281608 bytes)
2024-06-28 09:13:51,546 DEBUG [                   loci.formats.Memoizer] (l.Server-8) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-20/2018-11/26/10-05-14.978/Mmu_E1_GFPLifeactxmKate2nls.TM000000_timeFused_blending/.SPM00_TM000000_CM00_CM01_CHN01.fusedStack.corrected.shifted.klb.bfmemo (278165 bytes)
2024-06-28 09:14:55,666 DEBUG [                   loci.formats.Memoizer] (l.Server-9) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-12-34.562/.Embryo6.pattern.bfmemo (2652729 bytes)
2024-06-28 09:15:32,742 DEBUG [                   loci.formats.Memoizer] (l.Server-0) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-58-04.234/.NocMo.2_Cell.pattern.bfmemo (1248192 bytes)
2024-06-28 09:16:04,436 DEBUG [                   loci.formats.Memoizer] (l.Server-6) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-35-22.064/.MoNoc.Cell10.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:04,590 DEBUG [                   loci.formats.Memoizer] (.Server-13) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-37-16.341/.MoNoc.Cell01.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:04,612 DEBUG [                   loci.formats.Memoizer] (.Server-15) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-35-55.369/.NocMo.Cell04.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:04,752 DEBUG [                   loci.formats.Memoizer] (.Server-11) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-37-14.233/.MoNoc.Cell03.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:04,752 DEBUG [                   loci.formats.Memoizer] (l.Server-4) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-37-07.946/.MoNoc.Cell04.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:04,752 DEBUG [                   loci.formats.Memoizer] (l.Server-1) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-37-10.198/.MoNoc.Cell05.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:04,990 DEBUG [                   loci.formats.Memoizer] (.Server-22) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-36-14.012/.NocMo.Cell01.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:04,992 DEBUG [                   loci.formats.Memoizer] (l.Server-9) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-37-12.169/.MoNoc.Cell02.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:04,993 DEBUG [                   loci.formats.Memoizer] (.Server-17) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-36-24.573/.MoNoc.Cell16.pattern.bfmemo (140049 bytes)
2024-06-28 09:16:05,013 DEBUG [                   loci.formats.Memoizer] (.Server-21) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-35-22.064/.MoNoc.Cell10.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:05,215 DEBUG [                   loci.formats.Memoizer] (.Server-25) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-37-12.169/.MoNoc.Cell02.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:05,814 DEBUG [                   loci.formats.Memoizer] (l.Server-7) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-36-20.449/.NocMo.Cell02.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:05,871 DEBUG [                   loci.formats.Memoizer] (.Server-10) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-37-05.845/.MoNoc.Cell07.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:05,871 DEBUG [                   loci.formats.Memoizer] (.Server-12) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-36-32.902/.MoNoc.Cell12.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:05,969 DEBUG [                   loci.formats.Memoizer] (l.Server-2) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-37-03.806/.MoNoc.Cell06.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:05,989 DEBUG [                   loci.formats.Memoizer] (l.Server-0) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-37-01.809/.MoNoc.Cell09.pattern.bfmemo (140058 bytes)
2024-06-28 09:16:06,067 DEBUG [                   loci.formats.Memoizer] (l.Server-8) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-36-28.736/.MoNoc.Cell14.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:06,068 DEBUG [                   loci.formats.Memoizer] (.Server-18) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-36-22.555/.MoNoc.Cell15.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:06,069 DEBUG [                   loci.formats.Memoizer] (.Server-16) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-36-26.527/.MoNoc.Cell13.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:06,070 DEBUG [                   loci.formats.Memoizer] (l.Server-5) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-36-30.782/.MoNoc.Cell11.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:06,111 DEBUG [                   loci.formats.Memoizer] (l.Server-3) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-36-57.750/.MoNoc.Cell08.pattern.bfmemo (140058 bytes)
2024-06-28 09:16:06,127 DEBUG [                   loci.formats.Memoizer] (.Server-19) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-36-18.392/.NocMo.Cell03.pattern.bfmemo (140059 bytes)
2024-06-28 09:20:28,739 DEBUG [                   loci.formats.Memoizer] (.Server-18) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-12/2023-08/11/14-40-46.960/.230316_2108_ctrl_2h_smc4_dep_rep2_hemi_stlc_60min_zoom5-01-46.czi #2.tif_registered.tif.bfmemo (148816 bytes)
2024-06-28 09:20:33,959 DEBUG [                   loci.formats.Memoizer] (.Server-14) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-14/2023-08/11/14-21-06.678/.230320_2108_compensation_wapl_smc4_sororin_dep_stlc_rep1_hemi-02-25.tif_registered.tif.bfmemo (146968 bytes)
2024-06-28 11:09:22,655 DEBUG [                   loci.formats.Memoizer] (.Server-27) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/23/16-59-03.757/.img_TL.companion.ome.bfmemo (13192454 bytes)
2024-06-28 11:11:26,507 DEBUG [                   loci.formats.Memoizer] (.Server-23) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/23/16-59-03.757/.img_TL.companion.ome.bfmemo (13192454 bytes)
2024-06-28 11:11:37,988 DEBUG [                   loci.formats.Memoizer] (.Server-28) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/23/16-59-03.757/.img_TL.companion.ome.bfmemo (13192454 bytes)
2024-06-28 11:12:04,759 DEBUG [                   loci.formats.Memoizer] (.Server-30) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/23/16-59-03.757/.img_TL.companion.ome.bfmemo (13192454 bytes)
2024-06-28 11:13:09,255 DEBUG [                   loci.formats.Memoizer] (l.Server-6) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/23/16-59-03.757/.img_TL.companion.ome.bfmemo (13192454 bytes)
2024-06-28 11:14:13,382 DEBUG [                   loci.formats.Memoizer] (.Server-18) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/23/16-59-03.757/.img_TL.companion.ome.bfmemo (13192454 bytes)
2024-06-28 11:15:09,895 DEBUG [                   loci.formats.Memoizer] (.Server-32) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/23/16-59-03.757/.img_TL.companion.ome.bfmemo (13192454 bytes)
2024-06-28 11:15:28,375 DEBUG [                   loci.formats.Memoizer] (.Server-12) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/23/16-59-03.757/.img_TL.companion.ome.bfmemo (13192454 bytes)

img_TL.companion.ome.bfmemo corresponds to Image:12557109 above (idr0099) which took a long time to load. Checking $ grep -B 1 "saved memo" /opt/omero/server/OMERO.server/var/log/Blitz-0.log shows typical times like:

2024-06-28 11:15:28,288 DEBUG [                   loci.formats.Memoizer] (.Server-12) start[1719571343967] time[1984321] tag[loci.formats.Memoizer.saveMemo]

1984321 ms is half an hour (33 mins).

Similar story on omeroreadonly-3, with a bunch of long-running memo generations:

[wmoore@test122-omeroreadonly-3 ~]$ grep "loci.formats.Memoizer.saveMemo" /opt/omero/server/OMERO.server/var/log/Blitz-0.log
2024-06-28 10:38:56,130 DEBUG [                   loci.formats.Memoizer] (.Server-12) start[1719571132162] time[3968] tag[loci.formats.Memoizer.saveMemo]
2024-06-28 10:44:09,474 DEBUG [                   loci.formats.Memoizer] (l.Server-0) start[1719571390887] time[58586] tag[loci.formats.Memoizer.saveMemo]
2024-06-28 10:48:11,801 DEBUG [                   loci.formats.Memoizer] (.Server-13) start[1719571681990] time[9811] tag[loci.formats.Memoizer.saveMemo]
2024-06-28 11:01:09,652 DEBUG [                   loci.formats.Memoizer] (.Server-34) start[1719572233552] time[236100] tag[loci.formats.Memoizer.saveMemo]
2024-06-28 11:06:15,122 DEBUG [                   loci.formats.Memoizer] (l.Server-3) start[1719570923843] time[1851278] tag[loci.formats.Memoizer.saveMemo]
2024-06-28 11:09:42,979 DEBUG [                   loci.formats.Memoizer] (l.Server-5) start[1719571104120] time[1878859] tag[loci.formats.Memoizer.saveMemo]
2024-06-28 11:16:58,931 DEBUG [                   loci.formats.Memoizer] (l.Server-4) start[1719572173760] time[1245171] tag[loci.formats.Memoizer.saveMemo]
2024-06-28 11:18:55,829 DEBUG [                   loci.formats.Memoizer] (l.Server-6) start[1719572468542] time[1067286] tag[loci.formats.Memoizer.saveMemo]
2024-06-28 11:24:15,515 DEBUG [                   loci.formats.Memoizer] (.Server-11) start[1719573343504] time[512010] tag[loci.formats.Memoizer.saveMemo]
2024-06-28 11:24:16,433 DEBUG [                   loci.formats.Memoizer] (.Server-10) start[1719573343504] time[512929] tag[loci.formats.Memoizer.saveMemo]

e.g. 1067286 ms is 17 minutes - (for same image as above):

2024-06-28 11:18:55,829 DEBUG [                   loci.formats.Memoizer] (l.Server-6) start[1719572468542] time[1067286] tag[loci.formats.Memoizer.saveMemo]
2024-06-28 11:18:56,761 DEBUG [                   loci.formats.Memoizer] (l.Server-6) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/23/16-59-03.757/.img_TL.companion.ome.bfmemo (13192454 bytes)

and a few other images too:

[wmoore@test122-omeroreadonly-3 ~]$ grep "saved memo" /opt/omero/server/OMERO.server/var/log/Blitz-0.log
2024-06-28 10:38:56,460 DEBUG [                   loci.formats.Memoizer] (.Server-12) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-16/2024-02/28/14-27-02.476/.position_18.tif.bfmemo (38861 bytes)
2024-06-28 10:44:12,942 DEBUG [                   loci.formats.Memoizer] (l.Server-0) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-94/2019-09/17/08-31-42.409/.ExpB_whole_cns_AF.ome.tif.bfmemo (411891 bytes)
2024-06-28 10:48:12,591 DEBUG [                   loci.formats.Memoizer] (.Server-13) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-102/2019-09/17/08-38-35.243/.ExpC_TPH2_left_cerebellum_MIP_slice_600_to_800.tif.bfmemo (34991 bytes)
2024-06-28 11:01:13,148 DEBUG [                   loci.formats.Memoizer] (.Server-34) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-109/2019-09/17/08-39-19.075/.ExpD_chicken_embryo_stitched.ome.tif.bfmemo (1363919 bytes)
2024-06-28 11:06:16,760 DEBUG [                   loci.formats.Memoizer] (l.Server-3) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/23/16-59-03.757/.img_TL.companion.ome.bfmemo (13192454 bytes)
2024-06-28 11:09:44,114 DEBUG [                   loci.formats.Memoizer] (l.Server-5) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/23/16-59-03.757/.img_TL.companion.ome.bfmemo (13192454 bytes)
2024-06-28 11:17:00,086 DEBUG [                   loci.formats.Memoizer] (l.Server-4) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/23/16-59-03.757/.img_TL.companion.ome.bfmemo (13192454 bytes)
2024-06-28 11:18:56,761 DEBUG [                   loci.formats.Memoizer] (l.Server-6) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/23/16-59-03.757/.img_TL.companion.ome.bfmemo (13192454 bytes)
2024-06-28 11:24:15,659 DEBUG [                   loci.formats.Memoizer] (.Server-11) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-8/2021-02/23/17-10-04.263/.TP_Chgreen_Ill0_Ang0,1,2.pattern.bfmemo (11983275 bytes)
2024-06-28 11:24:16,612 DEBUG [                   loci.formats.Memoizer] (.Server-10) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-8/2021-02/23/17-10-04.263/.TP_Chgreen_Ill0_Ang0,1,2.pattern.bfmemo (11983275 bytes)

khaledk2 commented 4 months ago

Many error messages like this from the NGINX error log file from different nodes

2024/06/28 11:08:26 [error] 1183#1183: *107253 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.2.190, server: test122-omeroreadonly-2.novalocal, request: "GET /webgateway/get_thumbnails/?id=8849964&id=8849508&id=8848993&id=8848857&id=8849556&id=8849157&id=8849484&id=8849596&id=8848937&id=8850072&id=8849153&id=8848901&id=8849029&id=8848737&id=8849716&id=8850000&id=8848881&id=8850104&id=8849141&id=8850040&id=8850236&id=8849640&id=8849365&id=8849856&id=8849868&id=8848849&id=8849213&id=8849800&id=8849676&id=8849297&id=8849728&id=8848797&id=8850208&id=8849193&callback=jQuery362017553399812563741719572434117&=1719572434118 HTTP/1.0", upstream: "http://127.0.0.1:4080/webgateway/get_thumbnails/?id=8849964&id=8849508&id=8848993&id=8848857&id=8849556&id=8849157&id=8849484&id=8849596&id=8848937&id=8850072&id=8849153&id=8848901&id=8849029&id=8848737&id=8849716&id=8850000&id=8848881&id=8850104&id=8849141&id=8850040&id=8850236&id=8849640&id=8849365&id=8849856&id=8849868&id=8848849&id=8849213&id=8849800&id=8849676&id=8849297&id=8849728&id=8848797&id=8850208&id=8849193&callback=jQuery36201755339981256374_1719572434117&_=1719572434118", host: "idr-testing.openmicroscopy.org", referrer: "http://idr-testing.openmicroscopy.org/webclient/?show=project-1701"

will-moore commented 4 months ago

Need to generate memo files for Dataset Images, excluding idr0043... On omeroreadwrite...

/opt/omero/server/OMERO.server/bin/omero hql --limit -1 --ids-only --style csv 'select d.id from Dataset d where d.id in (select link.child.id from ProjectDatasetLink as link where link.parent.id!=501)' > datasets_not_idr0043.txt

cut -d ',' -f2 datasets_not_idr0043.txt | sed -e 's/^/Dataset:/' > datasets_not_idr0043_ids.txt

[wmoore@test122-omeroreadwrite ~]$ wc -l datasets_not_idr0043_ids.txt
2909 datasets_not_idr0043_ids.txt

[wmoore@test122-omeroreadwrite ~]$ screen -dmS cache parallel --eta -a datasets_not_idr0043_ids.txt --results /tmp/ngff_cache_20240628_ds_not_idr0043/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

After ~15 mins.. most images have memos so far....

[wmoore@test122-omeroreadwrite ~]$ grep "ok:" /tmp/ngff_cache_20240628_ds_not_idr0043/1/**/*
...
/tmp/ngff_cache_20240628_ds_not_idr0043/1/Dataset:99/stdout:ok: Pixels:1886003 Image:1886003 0.22210121154785156 
/tmp/ngff_cache_20240628_ds_not_idr0043/1/Dataset:99/stdout:ok: Pixels:1886004 Image:1886004 0.121246337890625 
/tmp/ngff_cache_20240628_ds_not_idr0043/1/Dataset:99/stdout:ok: Pixels:1885997 Image:1885997 0.07439756393432617 
/tmp/ngff_cache_20240628_ds_not_idr0043/1/Dataset:99/stdout:ok: Pixels:1885998 Image:1885998 3.472053289413452

[wmoore@test122-omeroreadwrite ~]$ grep "ok:" /tmp/ngff_cache_20240628_ds_not_idr0043/1/**/* | wc
   5906   23624  674103
# count Datasets... 2909 in total...
[wmoore@test122-omeroreadwrite ~]$ ls /tmp/ngff_cache_20240628_ds_not_idr0043/1/ | wc
    486     486    5783

will-moore commented 4 months ago

Also need to regenerate memo files for idr0009 - reported slow (missing) during testing and seen as experiment_descriptor.xml.bfmemo in the testing "memo saved" logs above. Most of the non-NGFF memo generation happened after idr0009 update, but not all, so these got missed...

omeroreadonly-1...

/opt/omero/server/OMERO.server/bin/omero hql --limit -1 --ids-only --style csv 'select MIN(field.image.id) FROM WellSample AS field where field.well.plate.id in (select link.child.id from ScreenPlateLink as link where link.parent.id=251) GROUP BY field.well.plate' > idr0009_plates.txt

cut -d ',' -f2 idr0009_plates.txt | sed -e 's/^/Image:/' > idr0009_plate_ids.txt

[wmoore@test122-omeroreadonly-1 ~]$ screen -dmS cache parallel --eta -a idr0009_plate_ids.txt --results /tmp/ngff_cache_20240628_idr0009/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'