Open will-moore opened 6 months ago
Using today's build (23rd May) of OMEZarrReader: As omero-server... Ran on all 5 servers:
bash-5.1$ wget https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-build/76/label=testintegration/artifact/bio-formats-build/ZarrReader/target/OMEZarrReader-0.4.2-SNAPSHOT-jar-with-dependencies.jar
bash-5.1$ mv OMEZarrReader-0.4.2-SNAPSHOT-jar-with-dependencies.jar OMEZarrReader-b76.jar
bash-5.1$ rm OMERO.server/lib/client/OMEZarrReader.jar && rm OMERO.server/lib/server/OMEZarrReader.jar
bash-5.1$ cp OMEZarrReader-b76.jar OMERO.server/lib/client/ && cp OMEZarrReader-b76.jar OMERO.server/lib/server/
exit
[wmoore@test122-omeroreadwrite ~]$ sudo service omero-server restart
Install mkngff on omeroreadwrite...
sudo -u root -s
source /opt/omero/server/venv3/bin/activate
pip install 'omero-mkngff @ git+https://github.com/IDR/omero-mkngff@main'
...
Resolved https://github.com/IDR/omero-mkngff to commit d3f95c90b379a8f61a866539cdb0e1e490fad84b
Successfully installed omero-mkngff-0.1.0.dev0
$ sudo -u omero-server -s
bash-5.1$ cd
bash-5.1$ git clone https://github.com/IDR/mkngff_upgrade_scripts.git
Setup screen and delete duplicate plate for idr0015
screen -S mkngff
source /opt/omero/server/venv3/bin/activate
export OMERODIR=/opt/omero/server/OMERO.server
omero config get omero.db.host
export DBHOST=192.168.10.231
omero config get omero.db.pass
export PGPASSWORD=[********]
omero login...
omero delete Plate:4801 --report > /tmp/delete_idr0015.log
mkngff...
omero mkngff setup > setup.sql
psql -U omero -d idr -h $DBHOST -f setup.sql
CREATE FUNCTION
cd mkngff_upgrade_scripts/ngff_filesets/idr0004
for i in $(ls); do sed -i 's/SECRETUUID/42650434-6eaa-45e5-9542-58247a45d8bc/g' $i; done
omero login
cd ngff_filesets
export IDRID=idr0004
for r in $(cat $IDRID.csv); do
biapath=$(echo $r | cut -d',' -f2)
uuid=$(echo $biapath | cut -d'/' -f2)
fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]')
psql -U omero -d idr -h $DBHOST -f "$IDRID/$fsid.sql"
omero mkngff symlink /data/OMERO/ManagedRepository $fsid "/bia-integrator-data/$biapath/$uuid.zarr" --bfoptions --clientpath="https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/$biapath/$uuid.zarr"
done
Fix idr0004 bfoptions (remove quick_read)
vi /data/OMERO/ManagedRepository/demo_2/2015-10/01/07-46-42.965_mkngff/35cfc0db-7795-497c-aed5-1ae591b2d9f1.zarr.bfoptions
vi /data/OMERO/ManagedRepository/demo_2/2015-10/01/07-57-40.271_mkngff/ee8872c8-e4b1-41fa-aa4f-a9e3e200c540.zarr.bfoptions
Repeat for all other studies...
idr0010 (15:43...) idr0011 (17:32...) idrr012 (17:43...) idr0013 (17:57...) idr0016 (22:10...) idr0015 (06:41...) idr0025 8:07 idr0026 8:08 idr0033 9:25 idr0035 9:45 idr0036 9:49 idr0051 10:01 idr0054 10:02 idr0064 10:05 idr0090 10:07 idr0091 10:12
BF cache memo generation...
$ ssh -A idr-testing.openmicroscopy.org -L 1080:omeroreadwrite:80
[wmoore@test122-proxy ~]$ grep -oE 'omero[^ ]+$' /etc/hosts > nodes
[wmoore@test122-proxy ~]$ cat nodes
omeroreadonly-1
omeroreadonly-2
omeroreadonly-3
omeroreadonly-4
omeroreadwrite
Delete (move) cache
ssh omeroreadwrite
sudo -u omero-server -s
cd /data/OMERO/BioFormatsCache/
mv data data_to_delete
Generate target IDs...
ssh omeroreadwrite
/opt/omero/server/OMERO.server/bin/omero login
/opt/omero/server/OMERO.server/bin/omero hql --limit -1 --ids-only --style csv 'SELECT MIN(field.image.id) FROM Screen as s JOIN s.plateLinks as plateLinks JOIN s.annotationLinks as annLinks join annLinks.child as ann join plateLinks.child as p join p.wells as w join w.wellSamples as field where ann.id=38304992 GROUP BY field.well.plate' > ngff_plates.txt
/opt/omero/server/OMERO.server/bin/omero hql --limit -1 --ids-only --style csv 'SELECT d.id FROM Project as p JOIN p.datasetLinks as datasetLinks JOIN p.annotationLinks as annLinks join annLinks.child as ann join datasetLinks.child as d where ann.id=38304992' > ngff_datasets.txt
vi ngff_plates.txt # remove first row
vi ngff_datasets.txt # remove first row
Back on proxy server... [wmoore@test122-proxy ~]
$ rsync -rvP omeroreadwrite:/home/wmoore/ngff_plates.txt ./
$ rsync -rvP omeroreadwrite:/home/wmoore/ngff_datasets.txt ./
$ cut -d ',' -f2 ngff_plates.txt | sed -e 's/^/Image:/' > ngff_ids.txt
$ cut -d ',' -f2 ngff_datasets.txt | sed -e 's/^/Dataset:/' >> ngff_ids.txt
$ wc ngff_ids.txt
1643 1643 22938 ngff_ids.txt
$ screen -dmS cache parallel --eta --sshloginfile nodes -a ngff_ids.txt --results /tmp/ngff_cache_20240524/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
Unfortunately it seems that the cache generation is silently failing.
Nothing is produced under /tmp/ngff_cache_20240524/
, and immediately running screen -r
shows no screens.
Checked for which parallel
and it isn't installed:
Installed...
sudo -u root -s
dnf install parallel
Ran again..
screen -dmS cache parallel --eta --sshloginfile nodes -a ngff_ids.txt --results /tmp/ngff_cache_20240524/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
EDIT: (27th May)
After running over the weekend didn't get many ok
[wmoore@test122-proxy 1]$ grep ok /tmp/ngff_cache_20240524/1/**/* | wc
59 236 5927
[wmoore@test122-proxy ~]$ ls /tmp/ngff_cache_20240524/1/ | wc
1643 1643 22938
Run again...
screen -dmS cache parallel --eta --sshloginfile nodes -a ngff_ids.txt --results /tmp/ngff_cache_20240527/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
28th May: ...after a day - screen -r
reports no screen. Exactly three times more ok
this time: (59 -> 177).
[wmoore@test122-proxy ~]$ grep ok /tmp/ngff_cache_20240527/1/**/* | wc
177 795 19336
Check a random output...
[wmoore@test122-proxy ~]$ cat /tmp/ngff_cache_20240527/1/Image:9822101/stderr
wmoore@omeroreadonly-3: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
Not sure what's going on here.
I checked I can do ssh omeroreadonly-3
etc to all the servers.
Run again!....
On completion (after OME 2024 meeting): We are getting a few more "ok" results, but still way below success:
[wmoore@test122-proxy ~]$ grep ok /tmp/ngff_cache_20240528/1/**/* | wc
230 946 23708
$ grep denied /tmp/ngff_cache_20240528/1/**/*
/tmp/ngff_cache_20240528/1/Dataset:11901/stderr:wmoore@omeroreadwrite: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
/tmp/ngff_cache_20240528/1/Dataset:11902/stderr:wmoore@omeroreadonly-2: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
/tmp/ngff_cache_20240528/1/Dataset:11903/stderr:wmoore@omeroreadwrite: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
/tmp/ngff_cache_20240528/1/Dataset:11904/stderr:wmoore@omeroreadonly-2: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
/tmp/ngff_cache_20240528/1/Dataset:11905/stderr:wmoore@omeroreadwrite: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
/tmp/ngff_cache_20240528/1/Dataset:11906/stderr:wmoore@omeroreadonly-2: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
...
$ grep denied /tmp/ngff_cache_20240528/1/**/* | wc
1436 5744 190228
Only see "Permission denied" with omeroreadwrite
and omeroreadonly-2
, not the others:
[wmoore@test122-proxy ~]$ grep denied /tmp/ngff_cache_20240528/1/**/* | grep "only-2" | wc
742 2968 98652
[wmoore@test122-proxy ~]$ grep denied /tmp/ngff_cache_20240528/1/**/* | grep "readwrite" | wc
694 2776 91576
On the 27th (previous run) we see "Permission denied" for all servers:
[wmoore@test122-proxy ~]$ grep denied /tmp/ngff_cache_20240527/1/**/* | grep "readwrite" | wc
294 1176 38795
[wmoore@test122-proxy ~]$ grep denied /tmp/ngff_cache_20240527/1/**/* | grep "readonly-1" | wc
163 652 21675
[wmoore@test122-proxy ~]$ grep denied /tmp/ngff_cache_20240527/1/**/* | grep "readonly-2" | wc
409 1636 54377
[wmoore@test122-proxy ~]$ grep denied /tmp/ngff_cache_20240527/1/**/* | grep "readonly-3" | wc
295 1180 39219
[wmoore@test122-proxy ~]$ grep denied /tmp/ngff_cache_20240527/1/**/* | grep "readonly-4" | wc
324 1296 43080
@sbesson Any ideas what's causing those Permission denied
errors? They seem to only come from 2 servers last time I ran this (omeroreadwrite
and omeroreadonly-2
):
$ grep denied /tmp/ngff_cache_20240528/1/**/*
/tmp/ngff_cache_20240528/1/Dataset:11901/stderr:wmoore@omeroreadwrite: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
/tmp/ngff_cache_20240528/1/Dataset:11902/stderr:wmoore@omeroreadonly-2: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
/tmp/ngff_cache_20240528/1/Dataset:11903/stderr:wmoore@omeroreadwrite: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
/tmp/ngff_cache_20240528/1/Dataset:11904/stderr:wmoore@omeroreadonly-2: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
/tmp/ngff_cache_20240528/1/Dataset:11905/stderr:wmoore@omeroreadwrite: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
/tmp/ngff_cache_20240528/1/Dataset:11906/stderr:wmoore@omeroreadonly-2: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
Running again now...
$ ssh -A idr-testing.openmicroscopy.org
Last login: Mon Jun 3 08:47:51 2024 from 134.36.66.49
[wmoore@test122-proxy ~]$ screen -dmS cache parallel --eta --sshloginfile nodes -a ngff_ids.txt --results /tmp/ngff_cache_20240603/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
EDIT: Didn't see any "Permission denied" errors this time, but plenty of others!
[wmoore@test122-proxy ~]$ grep rror /tmp/ngff_cache_20240603/1/**/* | wc
1704 10341 231921
[wmoore@test122-proxy ~]$ grep ResourceError /tmp/ngff_cache_20240603/1/**/* | wc
1669 10014 225473
[wmoore@test122-proxy ~]$ grep FileNotFoundError /tmp/ngff_cache_20240603/1/**/* | wc
29 261 5626
But checking these in the webclient either viewed fine or showed spinner (no memo file)..?
Except for idr0064... Still haven't updated https://github.com/IDR/mkngff_upgrade_scripts/blob/main/ngff_filesets/idr0064.csv with original scripts for updating vanilla Filesets to data at BioImage Archive. See https://github.com/IDR/idr-metadata/issues/682#issuecomment-1895755941
Another attempt (5th)...
$ ssh -A idr-testing.openmicroscopy.org -L 1080:omeroreadwrite:80
[wmoore@test122-proxy ~]$ screen -dmS cache parallel --eta --sshloginfile nodes -a ngff_ids.txt --results /tmp/ngff_cache_20240604/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
EDIT: checking back a few days later - looks better: NB use ok:
to avoid matching invoke(
:
[wmoore@test122-proxy ~]$ grep "ok:" /tmp/ngff_cache_20240604/1/**/* | wc
1829 7316 188495
But actually, most of these are Dataset images. If we exclude those, find only 239
Plate images are ok out of 1599
[wmoore@test122-proxy ~]$ grep "ok:" /tmp/ngff_cache_20240604/1/**/* | grep -v Dataset | wc
239 956 24012
[wmoore@test122-proxy ~]$ grep Image ngff_ids.txt | wc
1599 1599 22362
Also lots of ResourceError
s! 1346 (out of 1599 Plates), and 461 for Dataset Images
[wmoore@test122-proxy ~]$ grep "ResourceError" /tmp/ngff_cache_20240604/1/**/* | wc
1807 10842 245921
[wmoore@test122-proxy ~]$ grep "ResourceError" /tmp/ngff_cache_20240604/1/**/* | grep Dataset | wc
461 2766 63114
[wmoore@test122-proxy ~]$ grep "ResourceError" /tmp/ngff_cache_20240604/1/**/* | grep -v Dataset | wc
1346 8076 182807
Many images with ResourceError are viewable in webclient. Picking an Image that isn't (idr0004, plate 171, http://localhost:1080/webclient/?show=image-698831), look for fileset name in logs on all servers...
for server in omeroreadwrite omeroreadonly-1 omeroreadonly-2 omeroreadonly-3 omeroreadonly-4; do echo $server && ssh $server "grep e3283a6a-d25b-41e1-8ab7-1837b89e3a6e /opt/omero/server/OMERO.server/var/log/Blitz-0.log"; done
Finds entries in logs on omeroreadwrite, including
2024-06-11 07:43:30,171 DEBUG [ loci.formats.Memoizer] (.Server-10) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2015-10/01/08-51-39.709_mkngff/e3283a6a-d25b-41e1-8ab7-1837b89e3a6e.zarr/..zattrs.bfmemo (128925 bytes)
But No Errors with this search.
Also found 14 sessions
errors:
[wmoore@test122-proxy ~]$ grep "rror" /tmp/ngff_cache_20240604/1/**/* | grep -v ResourceError | grep sessions
/tmp/ngff_cache_20240604/1/Image:12550005/stderr:FileNotFoundError: [Errno 2] No such file or directory: path('/home/wmoore/omero/sessions/localhost/public/2f9ce65b-a180-4828-a64b-f2c08a8f6743')
/tmp/ngff_cache_20240604/1/Image:1486790/stderr:FileNotFoundError: [Errno 2] No such file or directory: path('/home/wmoore/omero/sessions/localhost/public/2f4eff31-ac3c-4f91-bc7b-e8464463f3a7')
...
Found 708 errors like:
/tmp/ngff_cache_20240604/1/Image:9821901/stderr:!! 06/04/24 09:49:07.402 error: 9 communicators not destroyed during global destruction.
@sbesson - updated the previous comment above with more details of errors on the last attempt at memo file generation for NGFF data (1599 Plate Images and 44 Datasets).
Summay:
1807
ResourceErrors but don't know what caused them. Only 239 out of 1599 NGFF Plates are "ok" in memo generation logs, but data seems to be OK when viewed in webclient.No such file or directory: path('/home/wmoore/omero/sessions/localhost/public/...
communicators not destroyed
From Seb: omeroreadonly-4
I am seeing lots of [java.io](http://java.io/).FileNotFoundException...(Transport endpoint is not connected)
in the logs. And
[sbesson@test122-omeroreadonly-4 ~]$ ls /bia-integrator-data
ls: cannot access '/bia-integrator-data': Transport endpoint is not connected
Fixed by unmouting and re-mounting goofys:
[wmoore@test122-omeroreadonly-4 ~]$ sudo umount /bia-integrator-data
[wmoore@test122-omeroreadonly-4 ~]$ sudo goofys --endpoint https://uk1s3.embassy.ebi.ac.uk/ -o allow_other bia-integrator-data /bia-integrator-data
[wmoore@test122-omeroreadonly-4 ~]$ ls /bia-integrator-data/S-BIAD815/51afff7c-eed4-44b4-95c7-1437d8807b97/51afff7c-eed4-44b4-95c7-1437d8807b97.zarr
0 OME
Also check this is mounted on all servers:
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "ls /bia-integrator-data/S-BIAD815/51afff7c-eed4-44b4-95c7-1437d8807b97/51afff7c-eed4-44b4-95c7-1437d8807b97.zarr"; done
omeroreadonly-1
0
OME
omeroreadonly-2
0
OME
omeroreadonly-3
0
OME
omeroreadonly-4
0
OME
omeroreadwrite
0
OME
6th attempt:
screen -dmS cache parallel --eta --sshloginfile nodes -a ngff_ids.txt --results /tmp/ngff_cache_20240611/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
Immediately there are a bunch of Image:XX
dirs:
[wmoore@test122-proxy ~]$ ls /tmp/ngff_cache_20240611/1 | wc
324 324 4535
A bunch of sessions
errors e.g.
/tmp/ngff_cache_20240611/1/Image:3415781/stderr:FileNotFoundError: [Errno 2] No such file or directory: path('/home/wmoore/omero/sessions/localhost/public/f11b34f2-a63c-4926-a207-ec423209f5cf')
$ grep "FileNotFoundError" /tmp/ngff_cache_20240611/1/**/* | wc
27 243 5239
And some "communicators not destroyed"
/tmp/ngff_cache_20240611/1/Image:1484119/stderr:!! 06/11/24 21:16:15.526 error: 9 communicators not destroyed during global destruction.
$ grep "rror" /tmp/ngff_cache_20240611/1/**/* | grep communicators | wc
17 187 2328
No other errors initially... Overnight, 2 other ResourceErrors:
[wmoore@test122-proxy ~]$ grep "rror" /tmp/ngff_cache_20240611/1/**/* | grep -v communicators | grep -v FileNotFoundError
/tmp/ngff_cache_20240611/1/Image:1556033/stdout:fail: Pixels:1556033 Image:1556033 11470.436125516891 exception ::omero::ResourceError
/tmp/ngff_cache_20240611/1/Image:1573071/stdout:fail: Pixels:1573071 Image:1573071 10738.076765537262 exception ::omero::ResourceError
Both of these are from idr0013 and are NOT NGFF data!
Fileset:18568
and Fileset:18655
are the 2nd and 3rd from last rows in idr0013.csv
. Last row is 18728
which also is not NGFF converted.
But checking earlier in idr0013.csv
e.g. 4th from end and earlier, finds no unconverted Filesets, so it looks like just the last 3 failed?
Checking progress (12th June)... It looks like the process is not running. Only 311 "ok" and still no more errors.
[wmoore@test122-proxy ~]$ screen -r
There is no screen to be resumed.
[wmoore@test122-proxy ~]$ ls /tmp/ngff_cache_20240611/1/ | wc
1643 1643 22938
[wmoore@test122-proxy ~]$ grep "ok:" /tmp/ngff_cache_20240611/1/**/* | wc
311 1244 31235
# /tmp/ngff_cache_20240611/1/Image:696324/stdout:Previous session expired for public on localhost:4064
[wmoore@test122-proxy ~]$ grep "session expired" /tmp/ngff_cache_20240611/1/**/* | wc
70 490 7142
Previous errors in stdout:
[wmoore@test122-proxy ~]$ cat /tmp/ngff_cache_20240611/1/**/stdout | grep -v "ok:" | grep -v "session expired"
fail: Pixels:1556033 Image:1556033 11470.436125516891 exception ::omero::ResourceError
fail: Pixels:1573071 Image:1573071 10738.076765537262 exception ::omero::ResourceError
But lots more in stderr - errors (with counts)
cat /tmp/ngff_cache_20240611/1/**/stderr
...
wmoore@omeroreadonly-3: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
[wmoore@test122-proxy ~]$ grep "wmoore@omeroreadonly-3: Permission denied (publickey,gssapi-keyex,gssapi-with-mic)" /tmp/ngff_cache_20240611/1/**/stderr | wc
492 1968 65421
[wmoore@test122-proxy ~]$ grep "wmoore@omeroreadwrite: Permission denied (publickey,gssapi-keyex,gssapi-with-mic)" /tmp/ngff_cache_20240611/1/**/stderr | wc
495 1980 65317
[wmoore@test122-proxy ~]$ grep "Password check failed" /tmp/ngff_cache_20240611/1/**/stderr | wc
27 243 3511
Let's try to log-in and set sessions timeout before running parallels
[wmoore@test122-proxy ~]$ for server in $(cat nodes); do ssh $server "/opt/omero/server/OMERO.server/bin/omero login -s localhost -u public -w public && /opt/omero/server/OMERO.server/bin/omero sessions timeout 3600"; done
Created session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
Previous session expired for public on localhost:4064
Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
cannot update timeout for 81a63a9a-1513-4a3a-95b0-0599b6c46d1a
Created session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
Previous session expired for public on localhost:4064
Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
cannot update timeout for bda1ab04-ea1c-4f39-838b-00bffe7795e8
Created session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
Previous session expired for public on localhost:4064
Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
cannot update timeout for 53ec8eb4-e947-49a2-b684-5b89ee8b02e3
Created session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
Previous session expired for public on localhost:4064
Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
cannot update timeout for 2480e613-d6da-4fbd-93d7-aa1974979cbf
Created session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
Previous session expired for public on localhost:4064
Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
Hmmm - oh well....
Give another try anyway...
screen -dmS cache parallel --eta --sshloginfile nodes -a ngff_ids.txt --results /tmp/ngff_cache_20240613/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
After few hours overnight - no errors except know 2 above - no FileNotFound
or communicators not destroyed
.
[wmoore@test122-proxy ~]$ grep "rror" /tmp/ngff_cache_20240613/1/**/*
/tmp/ngff_cache_20240613/1/Image:1556033/stdout:fail: Pixels:1556033 Image:1556033 1933.3922345638275 exception ::omero::ResourceError
/tmp/ngff_cache_20240613/1/Image:1573071/stdout:fail: Pixels:1573071 Image:1573071 1944.575980424881 exception ::omero::ResourceError
Some timeouts have changed to 60 min
, approx 1/5th (none were 60 mins before):
$ grep timeout /tmp/ngff_cache_20240613/1/**/*
...
/tmp/ngff_cache_20240613/1/Image:692360/stderr:Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
/tmp/ngff_cache_20240613/1/Image:693716/stderr:Reconnected to session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
/tmp/ngff_cache_20240613/1/Image:696271/stderr:Using session for public@localhost:4064. Idle timeout: 60 min. Current group: Public
/tmp/ngff_cache_20240613/1/Image:696324/stderr:Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
/tmp/ngff_cache_20240613/1/Image:696797/stderr:Using session for public@localhost:4064. Idle timeout: 60 min. Current group: Public
/tmp/ngff_cache_20240613/1/Image:9822101/stderr:Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
[wmoore@test122-proxy ~]$ grep timeout /tmp/ngff_cache_20240613/1/**/* | wc
326 3612 43591
[wmoore@test122-proxy ~]$ grep "timeout: 60" /tmp/ngff_cache_20240613/1/**/* | wc
72 792 9573
Some "ok:", similar number as before (and NONE of these are Dataset Images). Still small portion of 1599 total Plates!
[wmoore@test122-proxy ~]$ grep "ok:" /tmp/ngff_cache_20240613/1/**/* | wc
324 1296 32546
Lots of these - equally distributed between all 5 servers:
/tmp/ngff_cache_20240613/1/Image:9821951/stderr:wmoore@omeroreadonly-3: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
/tmp/ngff_cache_20240613/1/Image:9822001/stderr:wmoore@omeroreadonly-1: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
/tmp/ngff_cache_20240613/1/Image:9822051/stderr:wmoore@omeroreadonly-3: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
[wmoore@test122-proxy ~]$ grep "Permission denied" /tmp/ngff_cache_20240613/1/**/* | wc
1269 5076 168461
[wmoore@test122-proxy ~]$ grep "Permission denied" /tmp/ngff_cache_20240613/1/**/* | grep "readwrite" | wc
254 1016 33512
[wmoore@test122-proxy ~]$ grep "Permission denied" /tmp/ngff_cache_20240613/1/**/* | grep "readonly-1" | wc
254 1016 33774
[wmoore@test122-proxy ~]$ grep "Permission denied" /tmp/ngff_cache_20240613/1/**/* | grep "readonly-2" | wc
254 1016 33767
[wmoore@test122-proxy ~]$ grep "Permission denied" /tmp/ngff_cache_20240613/1/**/* | grep "readonly-3" | wc
253 1012 33638
[wmoore@test122-proxy ~]$ grep "Permission denied" /tmp/ngff_cache_20240613/1/**/* | grep "readonly-4" | wc
254 1016 33770
Looking at the logs for the source of these Permission denied
issues, I think the relevant events are
[sbesson@test122-omeroreadwrite ~]$ sudo grep -B 10 -A 10 fatal /var/log/secure-20240616
Jun 11 21:30:56 test122-omeroreadwrite sshd[1335505]: Accepted publickey for wmoore from 192.168.2.190 port 32888 ssh2: RSA SHA256:Nurjq5im8jid017paB3gXF7nz31bCaoY0W9UImvhwX8
Jun 11 21:30:56 test122-omeroreadwrite sshd[1335505]: pam_unix(sshd:session): session opened for user wmoore(uid=5098) by wmoore(uid=0)
Jun 11 21:46:35 test122-omeroreadwrite sshd[1335130]: Received disconnect from 192.168.2.190 port 55788:11: disconnected by user
Jun 11 21:46:35 test122-omeroreadwrite sshd[1335130]: Disconnected from user wmoore 192.168.2.190 port 55788
Jun 11 21:46:35 test122-omeroreadwrite sshd[1335127]: pam_unix(sshd:session): session closed for user wmoore
Jun 11 21:46:36 test122-omeroreadwrite sshd[1336220]: Accepted publickey for wmoore from 192.168.2.190 port 35256 ssh2: RSA SHA256:Nurjq5im8jid017paB3gXF7nz31bCaoY0W9UImvhwX8
Jun 11 21:46:36 test122-omeroreadwrite sshd[1336220]: pam_unix(sshd:session): session opened for user wmoore(uid=5098) by wmoore(uid=0)
Jun 11 22:15:16 test122-omeroreadwrite sshd[1334064]: Received disconnect from 192.168.2.190 port 37106:11: disconnected by user
Jun 11 22:15:16 test122-omeroreadwrite sshd[1334064]: Disconnected from user wmoore 192.168.2.190 port 37106
Jun 11 22:15:16 test122-omeroreadwrite sshd[1334018]: pam_unix(sshd:session): session closed for user wmoore
Jun 11 22:17:16 test122-omeroreadwrite sshd[1337471]: fatal: Timeout before authentication for 192.168.2.190 port 41974
Jun 11 22:21:59 test122-omeroreadwrite sshd[1337726]: Connection closed by authenticating user wmoore 192.168.2.190 port 52128 [preauth]
Jun 11 22:21:59 test122-omeroreadwrite sshd[1337728]: Connection closed by authenticating user wmoore 192.168.2.190 port 52134 [preauth]
Jun 11 22:21:59 test122-omeroreadwrite sshd[1337730]: Connection closed by authenticating user wmoore 192.168.2.190 port 52146 [preauth]
Jun 11 22:21:59 test122-omeroreadwrite sshd[1337732]: Connection closed by authenticating user wmoore 192.168.2.190 port 52160 [preauth]
Jun 11 22:21:59 test122-omeroreadwrite sshd[1337734]: Connection closed by authenticating user wmoore 192.168.2.190 port 52166 [preauth]
Jun 11 22:22:00 test122-omeroreadwrite sshd[1337736]: Connection closed by authenticating user wmoore 192.168.2.190 port 52174 [preauth]
Jun 11 22:22:00 test122-omeroreadwrite sshd[1337738]: Connection closed by authenticating user wmoore 192.168.2.190 port 53496 [preauth]
Jun 11 22:22:01 test122-omeroreadwrite sshd[1337740]: Connection closed by authenticating user wmoore 192.168.2.190 port 53502 [preauth]
Jun 11 22:22:01 test122-omeroreadwrite sshd[1337742]: Connection closed by authenticating user wmoore 192.168.2.190 port 53514 [preauth]
Jun 11 22:22:01 test122-omeroreadwrite sshd[1337744]: Connection closed by authenticating user wmoore 192.168.2.190 port 53518 [preauth]
--
Jun 13 22:14:54 test122-omeroreadwrite sshd[1453987]: pam_unix(sshd:session): session closed for user wmoore
Jun 13 22:14:55 test122-omeroreadwrite sshd[1454271]: Received disconnect from 192.168.2.190 port 34842:11: disconnected by user
Jun 13 22:14:55 test122-omeroreadwrite sshd[1454271]: Disconnected from user wmoore 192.168.2.190 port 34842
Jun 13 22:14:55 test122-omeroreadwrite sshd[1454200]: pam_unix(sshd:session): session closed for user wmoore
Jun 13 22:14:55 test122-omeroreadwrite sshd[1454219]: Received disconnect from 192.168.2.190 port 34816:11: disconnected by user
Jun 13 22:14:55 test122-omeroreadwrite sshd[1454219]: Disconnected from user wmoore 192.168.2.190 port 34816
Jun 13 22:14:55 test122-omeroreadwrite sshd[1454197]: pam_unix(sshd:session): session closed for user wmoore
Jun 13 22:14:55 test122-omeroreadwrite sshd[1454244]: Received disconnect from 192.168.2.190 port 34832:11: disconnected by user
Jun 13 22:14:55 test122-omeroreadwrite sshd[1454244]: Disconnected from user wmoore 192.168.2.190 port 34832
Jun 13 22:14:55 test122-omeroreadwrite sshd[1454199]: pam_unix(sshd:session): session closed for user wmoore
Jun 13 22:16:52 test122-omeroreadwrite sshd[1454272]: fatal: Timeout before authentication for 192.168.2.190 port 34850
Jun 13 22:16:52 test122-omeroreadwrite sshd[1454298]: fatal: Timeout before authentication for 192.168.2.190 port 34854
Jun 13 22:16:53 test122-omeroreadwrite sshd[1454383]: fatal: Timeout before authentication for 192.168.2.190 port 34868
Jun 13 22:16:54 test122-omeroreadwrite sshd[1454386]: fatal: Timeout before authentication for 192.168.2.190 port 34872
Jun 13 22:16:54 test122-omeroreadwrite sshd[1454388]: fatal: Timeout before authentication for 192.168.2.190 port 34884
Jun 13 22:16:54 test122-omeroreadwrite sshd[1454390]: fatal: Timeout before authentication for 192.168.2.190 port 34886
Jun 13 22:16:54 test122-omeroreadwrite sshd[1454392]: fatal: Timeout before authentication for 192.168.2.190 port 34888
Jun 13 22:16:55 test122-omeroreadwrite sshd[1454394]: fatal: Timeout before authentication for 192.168.2.190 port 34892
Jun 13 22:16:55 test122-omeroreadwrite sshd[1454397]: fatal: Timeout before authentication for 192.168.2.190 port 34896
Jun 13 22:16:55 test122-omeroreadwrite sshd[1454399]: fatal: Timeout before authentication for 192.168.2.190 port 34902
Jun 13 22:19:22 test122-omeroreadwrite sshd[1454576]: Connection closed by authenticating user wmoore 192.168.2.190 port 43110 [preauth]
Jun 13 22:19:22 test122-omeroreadwrite sshd[1454578]: Connection closed by authenticating user wmoore 192.168.2.190 port 43120 [preauth]
Jun 13 22:19:22 test122-omeroreadwrite sshd[1454580]: Connection closed by authenticating user wmoore 192.168.2.190 port 43130 [preauth]
Jun 13 22:19:22 test122-omeroreadwrite sshd[1454582]: Connection closed by authenticating user wmoore 192.168.2.190 port 43144 [preauth]
Jun 13 22:19:22 test122-omeroreadwrite sshd[1454584]: Connection closed by authenticating user wmoore 192.168.2.190 port 43146 [preauth]
Jun 13 22:19:22 test122-omeroreadwrite sshd[1454586]: Connection closed by authenticating user wmoore 192.168.2.190 port 43148 [preauth]
Jun 13 22:19:22 test122-omeroreadwrite sshd[1454588]: Connection closed by authenticating user wmoore 192.168.2.190 port 43152 [preauth]
Jun 13 22:19:22 test122-omeroreadwrite sshd[1454590]: Connection closed by authenticating user wmoore 192.168.2.190 port 43160 [preauth]
Jun 13 22:19:22 test122-omeroreadwrite sshd[1454591]: Connection closed by authenticating user wmoore 192.168.2.190 port 43172 [preauth]
Jun 13 22:19:22 test122-omeroreadwrite sshd[1454593]: Connection closed by authenticating user wmoore 192.168.2.190 port 43178 [preauth]
But I haven't found yet a good explanation for what could cause these timeouts breaking the SSH connection
wget https://artifacts.openmicroscopy.org/artifactory/ome.releases/ome/OMEZarrReader/0.5.1/OMEZarrReader-0.5.1.jar
rm OMERO.server/lib/client/OMEZarrReader-b76.jar && rm OMERO.server/lib/server/OMEZarrReader-b76.jar
cp OMEZarrReader-0.5.1.jar OMERO.server/lib/client/ && cp OMEZarrReader-0.5.1.jar OMERO.server/lib/server/
exit
$ sudo service omero-server restart
However, trying to view images in web (connected to omeroreadwrite:80) gave errors suggesting we're missing dependencies:
serverExceptionClass = ome.conditions.InternalException
message = Wrapped Exception: (java.lang.NoClassDefFoundError):
com/amazonaws/services/s3/model/S3ObjectInputStream
Reverting back to latest daily build...
wget https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-build/101/label=testintegration/artifact/bio-formats-build/ZarrReader/target/OMEZarrReader-0.5.2-SNAPSHOT-jar-with-dependencies.jar
mv OMEZarrReader-0.5.2-SNAPSHOT-jar-with-dependencies.jar OMEZarrReader-0.5.2_b101.jar
rm OMERO.server/lib/client/OMEZarrReader-0.5.1.jar && rm OMERO.server/lib/server/OMEZarrReader-0.5.1.jar
cp OMEZarrReader-0.5.2_b101.jar OMERO.server/lib/client/ && cp OMEZarrReader-0.5.2_b101.jar OMERO.server/lib/server/
Yesterday:
To avoid sshloginfile
issues seen above, I manually split ngff_ids.txt
(1643 rows) into blocks of 350 rows on each of the omeroreadonly
servers (omeroreadwrite I just started running with existing ngff_plates.txt
to cover the first 350+ rows).
omeroreadonly-1
: rows 351-700
omeroreadonly-2
: rows 701-1050
omeroreadonly-3
: rows 1051-1400
omeroreadonly-4
: rows 1401-1643
ssh to each server in turn and ran
screen -dmS cache parallel --eta -a ngff_ids.txt --results /tmp/ngff_cache_20240617/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
Later yesterday:
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240617/1/**/* | wc"; done
omeroreadonly-1
19 76 1908
omeroreadonly-2
20 80 2005
omeroreadonly-3
20 80 2000
omeroreadonly-4
21 84 2104
omeroreadwrite
362 1448 36445
Today:
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240617/1/**/* | wc"; done
omeroreadonly-1
43 172 4322
omeroreadonly-2
39 156 3910
omeroreadonly-3
50 200 5014
omeroreadonly-4
41 164 4105
omeroreadwrite
389 1556 39158
Found ResourceError
for idr0015 plates: Fileset:21118
(not NGFF converted).
Corresponds to 3 plates missing from idr0015.csv
: https://github.com/IDR/idr-metadata/issues/645#issuecomment-1849888173
Need to run these 3 as described there and and to idr0015.csv
.
Created idr0015a
on idr-testing as described there, then... etc
cd idr0015
for i in $(ls); do sed -i 's/42650434-6eaa-45e5-9542-58247a45d8bc/7eb6e556-9b77-45b6-b82b-c913cb85904e/g' $i; done
cd ../
(venv3) bash-5.1$ for r in $(cat idr0015a.csv); do
> biapath=$(echo $r | cut -d',' -f2)
> uuid=$(echo $biapath | cut -d'/' -f2)
> fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]')
> psql -U omero -d idr -h $DBHOST -f "$IDRID/$fsid.sql"
> omero mkngff symlink /data/OMERO/ManagedRepository $fsid "/bia-integrator-data/$biapath/$uuid.zarr" --bfoptions --clientpath="https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/$biapath/$uuid.zarr"
> done
UPDATE 396
BEGIN
mkngff_fileset
----------------
6321119
(1 row)
COMMIT
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-06/06/21-26-25.533
Creating dir at /data/OMERO/ManagedRepository/demo_2/2016-06/06/21-26-25.533_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/2016-06/06/21-26-25.533_mkngff/d69df538-4684-4b32-8ded-d2f2af43af9f.zarr -> /bia-integrator-data/S-BIAD861/d69df538-4684-4b32-8ded-d2f2af43af9f/d69df538-4684-4b32-8ded-d2f2af43af9f.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-06/06/21-26-25.533
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/2016-06/06/21-26-25.533_mkngff/d69df538-4684-4b32-8ded-d2f2af43af9f.zarr.bfoptions
UPDATE 396
BEGIN
mkngff_fileset
----------------
6321120
(1 row)
COMMIT
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-06/10/13-37-45.953
Creating dir at /data/OMERO/ManagedRepository/demo_2/2016-06/10/13-37-45.953_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/2016-06/10/13-37-45.953_mkngff/0cc5dbe3-444a-4ea2-a335-b51cf89c1c53.zarr -> /bia-integrator-data/S-BIAD861/0cc5dbe3-444a-4ea2-a335-b51cf89c1c53/0cc5dbe3-444a-4ea2-a335-b51cf89c1c53.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-06/10/13-37-45.953
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/2016-06/10/13-37-45.953_mkngff/0cc5dbe3-444a-4ea2-a335-b51cf89c1c53.zarr.bfoptions
UPDATE 396
BEGIN
mkngff_fileset
----------------
6321121
(1 row)
COMMIT
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-06/06/00-58-20.828
Creating dir at /data/OMERO/ManagedRepository/demo_2/2016-06/06/00-58-20.828_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/2016-06/06/00-58-20.828_mkngff/1a29207c-d50b-48b7-a7c0-54c6252bfd9c.zarr -> /bia-integrator-data/S-BIAD861/1a29207c-d50b-48b7-a7c0-54c6252bfd9c/1a29207c-d50b-48b7-a7c0-54c6252bfd9c.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-06/06/00-58-20.828
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/2016-06/06/00-58-20.828_mkngff/1a29207c-d50b-48b7-a7c0-54c6252bfd9c.zarr.bfoptions
Also did the same for idr0013 last 3 plates of idr0013.csv not converted (see above https://github.com/IDR/idr-metadata/issues/696#issuecomment-2161654156) These initially failed to run as the SESSION wasn't in the original sql scripts (so it didn't get correctly updated prior to running). Fixed in https://github.com/IDR/mkngff_upgrade_scripts/commit/be59fe98b57229f7f314162cd707be2551e1b483
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'rror' /tmp/ngff_cache_20240617/1/**/*"; done
omeroreadonly-1
/tmp/ngff_cache_20240617/1/Image:1964831/stdout:fail: Pixels:1964831 Image:1964831 1.9138693809509277 exception ::omero::ResourceError
/tmp/ngff_cache_20240617/1/Image:3103076/stdout:fail: Pixels:3103076 Image:3103076 16301.51702260971 exception ::omero::ResourceError
omeroreadonly-2
/tmp/ngff_cache_20240617/1/Image:3073892/stdout:fill: Pixels:3073892 Image:3073892 17484.479704618454 exception ::omero::ResourceError
omeroreadonly-3
/tmp/ngff_cache_20240617/1/Image:1962455/stdout:fail: Pixels:1962455 Image:1962455 2.024498224258423 exception ::omero::ResourceError
/tmp/ngff_cache_20240617/1/Image:2850468/stderr:!! 06/17/24 12:46:26.823 error: 4 communicators not destroyed during global destruction.
/tmp/ngff_cache_20240617/1/Image:2857814/stderr:!! 06/17/24 12:41:54.395 error: 2 communicators not destroyed during global destruction.
/tmp/ngff_cache_20240617/1/Image:3367911/stderr:!! 06/17/24 21:46:36.942 error: 2 communicators not destroyed during global destruction.
omeroreadonly-4
/tmp/ngff_cache_20240617/1/Image:1640587/stderr:!! 06/17/24 14:15:58.051 error: 6 communicators not destroyed during global destruction.
/tmp/ngff_cache_20240617/1/Image:2857533/stderr:!! 06/17/24 12:49:00.746 error: 6 communicators not destroyed during global destruction.
omeroreadwrite
/tmp/ngff_cache_20240617/1/Image:12546037/stderr:!! 06/17/24 12:17:22.525 error: 9 communicators not destroyed during global destruction.
/tmp/ngff_cache_20240617/1/Image:12550677/stderr:!! 06/17/24 12:17:53.251 error: 9 communicators not destroyed during global destruction.
...
/tmp/ngff_cache_20240617/1/Image:1556033/stdout:fail: Pixels:1556033 Image:1556033 12244.128482341766 exception ::omero::ResourceError
/tmp/ngff_cache_20240617/1/Image:1573071/stdout:fail: Pixels:1573071 Image:1573071 10876.942811727524 exception ::omero::ResourceError
Checked all these Images - All are viewable in webclient except for idr0015 and idr0013 Plates fixed above - viewed to trigger memo file...
omeroreadwrite
has lots of " communicators not destroyed" - Just checked ResourceErrors...
Progress: current state...
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240617/1/**/* | wc"; done
omeroreadonly-1
89 356 8945
omeroreadonly-2
83 332 8328
omeroreadonly-3
92 368 9232
omeroreadonly-4
89 356 8919
omeroreadwrite
437 1748 43980
Checking progress in webclient:
Need better way to monitor completed memo filesets.
e.g.
(venv3) (base) [wmoore@pilot-idr0138-omeroreadwrite scripts]$ /opt/omero/server/OMERO.server/bin/omero hql --limit -1 --ids-only --style csv 'select MIN(field.image.id) FROM WellSample AS field where field.well.plate.id in (select link.child.id from ScreenPlateLink as link where link.parent.id=2851) GROUP BY field.well.plate' > idr0090_plates.txt
cut -d ',' -f2 idr0090_plates.txt | sed -e 's/^/Image:/' > idr0090_ids.txt
On proxy server, make a list of all "ok" logs:
[wmoore@test122-proxy ~]$ rsync -rvP omeroreadwrite:/home/wmoore/idr0010_ids.txt ./
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240617/1/**/*" >> ok_ids.txt; done
[wmoore@test122-proxy ~]$ wc ok_ids.txt
842 3368 84631 ok_ids.txt
E.g. idr0010, check with Images are in ok logs - 80 / 148 are ok:
$ for i in $(cat idr0010_ids.txt); do echo $i && grep $i ok_ids.txt >> idr0010_ok.txt; done
$ wc idr0010_ok.txt
80 320 8036 idr0010_ok.txt
$ wc idr0010_ids.txt
148 148 2072 idr0010_ids.txt
for idr in idr0011a idr0011b idr0011c idr0011d idr0012 idr0013a idr0013b idr0015 idr0016 idr0025 idr0033 idr0035 idr0036 idr0064; do
echo $(echo $idr)_plates.txt && cut -d ',' -f2 $(echo $idr)_plates.txt | sed -e 's/^/Image:/' > $(echo $idr)_ids.txt;
done
for idr in idr0011a idr0011b idr0011c idr0011d idr0012 idr0013a idr0013b idr0015 idr0016 idr0025 idr0033 idr0035 idr0036 idr0064; do
echo $(echo $idr)_ids.txt && rsync -rvP omeroreadwrite:/home/wmoore/$(echo $idr)_ids.txt ./
done
Updating iviewer to 0.14.0:
for server in omeroreadwrite omeroreadonly-1 omeroreadonly-2 omeroreadonly-3 omeroreadonly-4; do ssh $server "sudo /opt/omero/web/venv3/bin/pip uninstall -y omero-iviewer && sudo /opt/omero/web/venv3/bin/pip install omero-iviewer==0.14.0 && sudo service omero-web restart"; done
Team testing on idr-testing with microservices today was "slower than expected" with a few odd image rendering failures. I didn't stop the memo generation before testing, which probably would have helped. Also tricky to test with incomplete memo file generation.
Now, stopped memo generation and will target the completion of individual studies...
Running individual Screens on different servers like this:
[wmoore@test122-omeroreadwrite ~]$ screen -dmS cache parallel --eta -a idr0004_ids.txt --results /tmp/ngff_cache_20240621_idr0004/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
omeroreadonly-1
do idr0010
(148)omeroreadonly-2
do idr0011a
(129)omeroreadonly-3
do idr0012
(68)omeroreadonly-3
do idr0015
(83)omeroreadwrite
do idr0004
: (46)After a couple of minutes we have:
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240621_*/1/**/* | wc"; done
omeroreadonly-1
29 116 3141
omeroreadonly-2
36 144 3941
omeroreadonly-3
21 84 2280
omeroreadonly-4
23 92 2493
omeroreadwrite
10 40 1053
But also lots of
stderr:FileNotFoundError: [Errno 2] No such file or directory: path('/home/wmoore/omero/sessions/localhost/public/e3906e3b-0282-482e-bd2c-c363ae0f1f79')
On omeroreadwrite
, I cancelled the screen, deleted log dir and re-ran, hoping that existing session would avoid those errors - which seems to work.
Got 19 initial "ok" in first minute and no FileNotFoundError
errors.
Error counts are now:
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'rror' /tmp/ngff_cache_20240621_*/1/**/* | wc"; done
omeroreadonly-1
9 81 1818
omeroreadonly-2
9 81 1827
omeroreadonly-3
11 97 2067
omeroreadonly-4
9 81 1818
omeroreadwrite
0 0 0
9-fewer errors for omeroreadwrite and 9 more oks!
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240621_*/1/**/* | wc"; done
omeroreadonly-1
29 116 3141
omeroreadonly-2
65 260 7119
omeroreadonly-3
22 88 2386
omeroreadonly-4
23 92 2493
omeroreadwrite
19 76 1998
Also cancelled, delete logs and restarted other 4 readonly servers too... (12:35)
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240621_*/1/**/* | wc"; done
omeroreadonly-1
33 132 3576
omeroreadonly-2
123 492 13464
omeroreadonly-3
26 104 2819
omeroreadonly-4
23 92 2494
omeroreadwrite
42 168 4429
15:39...
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240621_*/1/**/* | wc"; done
omeroreadonly-1
53 212 5744. (out of 148)
omeroreadonly-2
127 508 13902. (out of 129 - idr0011a - DONE, see below)
omeroreadonly-3
26 104 2819. (out of 68 - idr0012)
omeroreadonly-4
30 120 3252. (out of 38)
omeroreadwrite
46 184 4849. (all DONE 46 - idr0004)
on omeroreadwrite, rename previous log dir so it doesn't show up in grep, start idr0125... - DONE then idr0033... (12 plates)...
for the 2 not ok images from idr0011a, these both had DatabaseBusy exceptions at 10:40
[wmoore@test122-omeroreadonly-2 ~]$ cat /tmp/ngff_cache_20240621_idr0011a/1/Image:2852482/stderr
[wmoore@test122-omeroreadonly-2 ~]$ cat /tmp/ngff_cache_20240621_idr0011a/1/Image:2852626/stderr
Both viewed OK in webclient - idr0011A DONE. Start idr0011b (40 plates)...
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240621_*/1/**/* | wc"; done
omeroreadonly-1
64 256 6940 (out of 148 - idr0010)
omeroreadonly-2
19 76 2080. (out of 40 - idr0011b)
omeroreadonly-3
26 104 2819. (out of 68 - idr0012) - no progress - lots of DatabaseBusyExceptions - restarted 16:50...
omeroreadonly-4
30 120 3252 (out of 83 - idr0015) - no progress
omeroreadwrite
10 40 1083. (out of 12 - idr0033)
Overnight...
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240621_*/1/**/* | wc"; done
omeroreadonly-1
148 592 16041. (out of 148 - idr0010. - DONE) - start with idr0035
omeroreadonly-2
40 160 4379. (out of 40 - idr0011b - DONE) - start with idr0036
omeroreadonly-3
55 220 5958. (out of 68 - idr0012)
omeroreadonly-4
57 228 6184. (out of 83 - idr0015)
omeroreadwrite
12 48 1301 (out of 12 - idr0033 - DONE) - start with idr0090
After few mins...
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240621_*/1/**/* | wc"; done
omeroreadonly-1
55 220 5968 (out of 55 - idr0035. - DONE)
omeroreadonly-2
17 68 1846. (out of 20 - idr0036)
omeroreadonly-3
55 220 5958. (out of 68 - idr0012)
omeroreadonly-4
57 228 6184. (out of 83 - idr0015)
omeroreadwrite
16 64 1779. (out of 20 - idr0090)
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240621_*/1/**/* | wc"; done
omeroreadonly-1
55 220 5968. (out of 55 - idr0035. - DONE)
omeroreadonly-2
20 80 2169. (out of 20 - idr0036 - DONE)
omeroreadonly-3
55 220 5958. (out of 68 - idr0012) - TODO: investigate!!
omeroreadonly-4
57 228 6184 (out of 83 - idr0015) - TODO: investigate!!
omeroreadwrite
20 80 2225. (out of 20 - idr0090 - DONE) - restart ngff_datasets...
screen -dmS cache parallel --eta -a ngff_dataset_ids.txt --results /tmp/ngff_cache_20240623_datasets/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
...
[wmoore@test122-omeroreadwrite ~]$ grep "ok:" /tmp/ngff_cache_20240623_datasets/1/*/** | wc
1580 6320 176656
Start ALL plates memo file generation...
[wmoore@test122-proxy ~]$ wc all_plate_ids.txt
6762 6762 95866 all_plate_ids.txt
Start all fresh... in batches of 1500 rows with:
screen -dmS cache parallel --eta -a all_plate_ids.txt --results /tmp/ngff_cache_20240623_allplates/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
# remove first 1500 rows
sed '1,1500 d' all_plate_ids.txt > all_plate_ids_1500.txt
Cancel all memo generation (terminate screens) in prep for testing later this morning...
NB: for omeroreadonly-3 and omeroreadonly-4, screens were still active for idr0012
and idr0015
above.
Checking idr0012 logs on omeroreadonly-3.. Find the 13 images missing from the logs with:
[wmoore@test122-omeroreadonly-3 ~]$ for i in $(cat idr0012_ids.txt); do echo $i && grep $i /tmp/ngff_cache_20240621_idr0012/1/*/**; done
Manually checked each in webclient. All viewed OK except for Image:1819431
.
Checking idr0015 logs on omeroreadonly-4..
for i in $(cat idr0015_ids.txt); do echo $i && grep $i /tmp/ngff_cache_20240621_idr0015/1/*/**; done
Took the Image:IDs not found in logs and put them in a file to run render test
...
[wmoore@test122-omeroreadonly-4 ~]$ vi idr0015_retest.txt
[wmoore@test122-omeroreadonly-4 ~]$ for i in $(cat idr0015_retest.txt); do echo $i && /opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force $i; done
Image:1971645
Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
ok: Pixels:1971645 Image:1971645 2.5951030254364014
...
Stuck on Image:1957701
... And the next image also lacks memo file. (update: http://localhost:1080/webclient/?show=image-1957701 is good now)
Run the remaining via usual route...
[wmoore@test122-omeroreadonly-4 ~]$ screen -dmS cache parallel --eta -a idr0015_retest.txt --results /tmp/ngff_cache_20240624_idr0015 -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
Cancelled after ~ an hour (8:09) in prep for testing...
Current status Summary: Memo files all complete for:
Testing delayed until 1:30 today, so let's restart idr0015 last few on omeroreadonly-4...
[wmoore@test122-omeroreadonly-4 ~]$ screen -dmS cache parallel --eta -a idr0015_retest.txt --results /tmp/ngff_cache_20240624_idr0015 -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
Also run idr0011C and idr0011D on omeroreadonly-2 (4 plates, 8 plates).
screen -dmS cache parallel --eta -a idr0011c_ids.txt --results /tmp/ngff_cache_20240624_idr0011c -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
All completed with "ok" (and checked in webclient).
Restarting with e.g.:
[wmoore@test122-omeroreadwrite ~]$ screen -dmS cache parallel --eta -a idr0013a_ids.txt --results /tmp/ngff_cache_20240624_idr0013a -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240624_*/1/**/* | wc"; done
omeroreadonly-1
28 112 3067. (idr0013B - DONE)
omeroreadonly-2
36 144 4022. (idr0016 - 413 plates)
omeroreadwrite
234 936 25623. (idr0013a - 510 plates)
idr-testing omeroreadonly-3.
Testing started at 13:30 (12:30 GMT on logs): Checking...
[wmoore@test122-omeroreadonly-4 ~]$ less /opt/omero/server/OMERO.server/var/log/Blitz-0.log.1
First ERROR after that (using grep)...
2024-06-24 12:32:12,119 ERROR [ ome.services.util.ServiceHandler] (.Server-32) Method interface ome.api.IQuery.findAllByQuery invocation took 28799
with no other errors,
Then....found a bunch of similar errors due to Database query times, starting at
2024-06-24 12:33:27,596 INFO [ ome.services.util.ServiceHandler] (.Server-22) Excp: org.springframework.dao.DataAccessResourceFailureException: Hibernate operation: could not execute query; SQL [select image0_.id as id60_, image
2024-06-24 12:34:23,490 INFO [ ome.services.util.ServiceHandler] (.Server-67) Excp: org.springframework.dao.DataAccessResourceFailureException: Hibernate operation: could not execute query; SQL [select image0_.id as id60_, image
0_.acquisitionDate as acquisit2_60_, image0_.archived as archived60_, image0_.description as descript4_60_, image0_.creation_id as creation10_60_, image0_.external_id as external11_60_, image0_.group_id as group12_60_, image0_.owner_id as o
wner13_60_, image0_.permissions as permissi5_60_, image0_.update_id as update14_60_, image0_.experiment as experiment60_, image0_.fileset as fileset60_, image0_.format as format60_, image0_.imagingEnvironment as imaging18_60_, image0_.instr
ument as instrument60_, image0_.name as name60_, image0_.objectiveSettings as objecti20_60_, image0_.partial as partial60_, image0_.series as series60_, image0_.stageLabel as stageLabel60_, image0_.version as version60_ from image image0_ l
eft outer join wellsample wellsample1_ on image0_.id=wellsample1_.image and
(
1 = ? OR
1 = ? OR
(wellsample1_.group_id in (?)) OR
(wellsample1_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = wellsample1_.group_id)) OR
(wellsample1_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = wellsample1_.group_id)) OR
((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = wellsample1_.group_id))
) inner join well well2_ on wellsample1_.well=well2_.id inner join plate plate3_ on well2_.plate=plate3_.id left outer join screenplatelink screenlink4_ on plate3_.id=screenlink4_.child and
(
1 = ? OR
1 = ? OR
(screenlink4_.group_id in (?)) OR
(screenlink4_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = screenlink4_.group_id)) OR
(screenlink4_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = screenlink4_.group_id)) OR
((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = screenlink4_.group_id))
) inner join screen screen5_ on screenlink4_.parent=screen5_.id left outer join imageannotationlink annotation6_ on image0_.id=annotation6_.parent and
(
1 = ? OR
1 = ? OR
(annotation6_.group_id in (?)) OR
(annotation6_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = annotation6_.group_id)) OR
(annotation6_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = annotation6_.group_id)) OR
((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = annotation6_.group_id))
) inner join annotation annotation7_ on annotation6_.child=annotation7_.id where
(
1 = ? OR
1 = ? OR
(image0_.group_id in (?)) OR
(image0_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = image0_.group_id)) OR
(image0_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = image0_.group_id)) OR
((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = image0_.group_id))
) and screen5_.id=? and annotation7_.textValue=? order by well2_."column", well2_."row"]; ERROR: canceling statement due to statement timeout; nested exception is org.postgresql.util.PSQLException: ERROR: canceling statement due to statemen
t timeout
2024-06-24 12:34:23,490 ERROR [ ome.services.util.ServiceHandler] (.Server-67) Method interface ome.api.IQuery.findAllByQuery invocation took 60052
2024-06-24 12:34:37,857 ERROR [ ome.services.util.ServiceHandler] (.Server-40) Method interface ome.api.IQuery.findAllByQuery invocation took 60046
2024-06-24 12:34:44,946 INFO [ org.perf4j.TimingLogger] (.Server-24) start[1719232424846] time[60100] tag[omero.call.exception]
2024-06-24 12:34:44,947 WARN [ ome.services.util.ServiceHandler] (.Server-24) Unknown exception thrown.
org.springframework.dao.DataAccessResourceFailureException: Hibernate operation: could not execute query; SQL [select image0_.id as id60_, image0_.acquisitionDate as acquisit2_60_, image0_.archived as archived60_, image0_.description as des
cript4_60_, image0_.creation_id as creation10_60_, image0_.external_id as external11_60_, image0_.group_id as group12_60_, image0_.owner_id as owner13_60_, image0_.permissions as permissi5_60_, image0_.update_id as update14_60_, image0_.exp
eriment as experiment60_, image0_.fileset as fileset60_, image0_.format as format60_, image0_.imagingEnvironment as imaging18_60_, image0_.instrument as instrument60_, image0_.name as name60_, image0_.objectiveSettings as objecti20_60_, ima
ge0_.partial as partial60_, image0_.series as series60_, image0_.stageLabel as stageLabel60_, image0_.version as version60_ from image image0_ left outer join wellsample wellsample1_ on image0_.id=wellsample1_.image and
(
1 = ? OR
1 = ? OR
(wellsample1_.group_id in (?)) OR
(wellsample1_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = wellsample1_.group_id)) OR
(wellsample1_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = wellsample1_.group_id)) OR
((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = wellsample1_.group_id))
) inner join well well2_ on wellsample1_.well=well2_.id inner join plate plate3_ on well2_.plate=plate3_.id left outer join screenplatelink screenlink4_ on plate3_.id=screenlink4_.child and
(
1 = ? OR
1 = ? OR
(screenlink4_.group_id in (?)) OR
(screenlink4_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = screenlink4_.group_id)) OR
(screenlink4_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = screenlink4_.group_id)) OR
((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = screenlink4_.group_id))
) inner join screen screen5_ on screenlink4_.parent=screen5_.id left outer join imageannotationlink annotation6_ on image0_.id=annotation6_.parent and
(
1 = ? OR
1 = ? OR
(annotation6_.group_id in (?)) OR
(annotation6_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = annotation6_.group_id)) OR
(annotation6_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = annotation6_.group_id)) OR
((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = annotation6_.group_id))
) inner join annotation annotation7_ on annotation6_.child=annotation7_.id where
(
1 = ? OR
1 = ? OR
(image0_.group_id in (?)) OR
(image0_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = image0_.group_id)) OR
(image0_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = image0_.group_id)) OR
((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = image0_.group_id))
) and screen5_.id=? and annotation7_.textValue=? order by well2_."column", well2_."row"]; ERROR: canceling statement due to statement timeout; nested exception is org.postgresql.util.PSQLException: ERROR: canceling statement due to statemen
t timeout
at org.springframework.jdbc.support.SQLStateSQLExceptionTranslator.doTranslate(SQLStateSQLExceptionTranslator.java:105)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:73)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:82)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:82)
at org.springframework.orm.hibernate3.HibernateAccessor.convertJdbcAccessException(HibernateAccessor.java:428)
2024-06-24 12:35:47,050 INFO [ org.perf4j.TimingLogger] (.Server-56) start[1719232486582] time[60468] tag[omero.call.exception]
2024-06-24 12:35:47,051 WARN [ ome.services.util.ServiceHandler] (.Server-56) Unknown exception thrown.
org.springframework.dao.DataAccessResourceFailureException: Hibernate operation: could not execute query; SQL [select image0_.id as id60_, image0_.acquisitionDate as acquisit2_60_, image0_.archived as archived60_, image0_.description as des
cript4_60_, image0_.creation_id as creation10_60_, image0_.external_id as external11_60_, image0_.group_id as group12_60_, image0_.owner_id as owner13_60_, image0_.permissions as permissi5_60_, image0_.update_id as update14_60_, image0_.exp
eriment as experiment60_, image0_.fileset as fileset60_, image0_.format as format60_, image0_.imagingEnvironment as imaging18_60_, image0_.instrument as instrument60_, image0_.name as name60_, image0_.objectiveSettings as objecti20_60_, ima
ge0_.partial as partial60_, image0_.series as series60_, image0_.stageLabel as stageLabel60_, image0_.version as version60_ from image image0_ left outer join wellsample wellsample1_ on image0_.id=wellsample1_.image and
(
1 = ? OR
1 = ? OR
(wellsample1_.group_id in (?)) OR
(wellsample1_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = wellsample1_.group_id)) OR
(wellsample1_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = wellsample1_.group_id)) OR
((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = wellsample1_.group_id))
) inner join well well2_ on wellsample1_.well=well2_.id inner join plate plate3_ on well2_.plate=plate3_.id left outer join screenplatelink screenlink4_ on plate3_.id=screenlink4_.child and
(
1 = ? OR
1 = ? OR
(screenlink4_.group_id in (?)) OR
(screenlink4_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = screenlink4_.group_id)) OR
(screenlink4_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = screenlink4_.group_id)) OR
((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = screenlink4_.group_id))
) inner join screen screen5_ on screenlink4_.parent=screen5_.id left outer join imageannotationlink annotation6_ on image0_.id=annotation6_.parent and
(
1 = ? OR
1 = ? OR
(annotation6_.group_id in (?)) OR
(annotation6_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = annotation6_.group_id)) OR
(annotation6_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = annotation6_.group_id)) OR
((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = annotation6_.group_id))
) inner join annotation annotation7_ on annotation6_.child=annotation7_.id where
(
1 = ? OR
1 = ? OR
(image0_.group_id in (?)) OR
(image0_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = image0_.group_id)) OR
(image0_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = image0_.group_id)) OR
((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = image0_.group_id))
) and screen5_.id=? and annotation7_.textValue=? order by well2_."column", well2_."row"]; ERROR: canceling statement due to statement timeout; nested exception is org.postgresql.util.PSQLException: ERROR: canceling statement due to statemen
t timeout
at org.springframework.jdbc.support.SQLStateSQLExceptionTranslator.doTranslate(SQLStateSQLExceptionTranslator.java:105)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:73)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:82)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:82)
at org.springframework.orm.hibernate3.HibernateAccessor.convertJdbcAccessException(HibernateAccessor.java:428)
2024-06-24 12:39:04,561 INFO [ org.perf4j.TimingLogger] (.Server-61) start[1719232684542] time[60019] tag[omero.call.exception]
2024-06-24 12:39:04,562 WARN [ ome.services.util.ServiceHandler] (.Server-61) Unknown exception thrown.
org.springframework.dao.DataAccessResourceFailureException: Hibernate operation: could not execute query; SQL [select image0_.id as id60_, image0_.acquisitionDate as acquisit2_60_, image0_.archived as archived60_, image0_.description as des
cript4_60_, image0_.creation_id as creation10_60_, image0_.external_id as external11_60_, image0_.group_id as group12_60_, image0_.owner_id as owner13_60_, image0_.permissions as permissi5_60_, image0_.update_id as update14_60_, image0_.exp
eriment as experiment60_, image0_.fileset as fileset60_, image0_.format as format60_, image0_.imagingEnvironment as imaging18_60_, image0_.instrument as instrument60_, image0_.name as name60_, image0_.objectiveSettings as objecti20_60_, ima
ge0_.partial as partial60_, image0_.series as series60_, image0_.stageLabel as stageLabel60_, image0_.version as version60_ from image image0_ left outer join wellsample wellsample1_ on image0_.id=wellsample1_.image and
(
1 = ? OR
1 = ? OR
(wellsample1_.group_id in (?)) OR
(wellsample1_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = wellsample1_.group_id)) OR
(wellsample1_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = wellsample1_.group_id)) OR
((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = wellsample1_.group_id))
) inner join well well2_ on wellsample1_.well=well2_.id inner join plate plate3_ on well2_.plate=plate3_.id left outer join screenplatelink screenlink4_ on plate3_.id=screenlink4_.child and
(
1 = ? OR
1 = ? OR
(screenlink4_.group_id in (?)) OR
(screenlink4_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = screenlink4_.group_id)) OR
(screenlink4_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = screenlink4_.group_id)) OR
((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = screenlink4_.group_id))
) inner join screen screen5_ on screenlink4_.parent=screen5_.id left outer join imageannotationlink annotation6_ on image0_.id=annotation6_.parent and
(
1 = ? OR
1 = ? OR
(annotation6_.group_id in (?)) OR
(annotation6_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = annotation6_.group_id)) OR
(annotation6_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = annotation6_.group_id)) OR
((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = annotation6_.group_id))
) inner join annotation annotation7_ on annotation6_.child=annotation7_.id where
(
1 = ? OR
1 = ? OR
(image0_.group_id in (?)) OR
(image0_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = image0_.group_id)) OR
(image0_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = image0_.group_id)) OR
((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = image0_.group_id))
) and screen5_.id=? and annotation7_.textValue=? order by well2_."column", well2_."row"]; ERROR: canceling statement due to statement timeout; nested exception is org.postgresql.util.PSQLException: ERROR: canceling statement due to statement timeout
at org.springframework.jdbc.support.SQLStateSQLExceptionTranslator.doTranslate(SQLStateSQLExceptionTranslator.java:105)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:73)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:82)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:82)
at org.springframework.orm.hibernate3.HibernateAccessor.convertJdbcAccessException(HibernateAccessor.java:428)
cc @khaledk2 @sbesson
Restart ALL memo file generation (includes idr0013A and idr0016 not completed).. with e.g.
screen -dmS cache parallel --eta -a all_plate_ids.txt --results /tmp/ngff_cache_20240624_allplates/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
EDIT (2 days later) - I realise I should have used e.g. all_plate_ids_1500.txt
rather than all_plate_ids.txt
on readonly 2, 3 & 4 since the original all_plate_ids.txt
wasn't modified. So, all servers ran the same ids!
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240624_*/1/**/* | wc"; done
omeroreadonly-1
454 1816 48862
omeroreadonly-2
464 1856 50313
omeroreadonly-3
426 1704 46100
omeroreadonly-4
Connection to idr-testing.openmicroscopy.org closed by remote host.
Later - still can't ssh to omeroreadonly-4
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240624_*/1/**/* | wc"; done
omeroreadonly-1
1247 4988 134637
omeroreadonly-2
1257 5028 136078
omeroreadonly-3
1219 4876 131853
...
[wmoore@test122-omeroreadwrite ~]$ grep 'ok:' /tmp/ngff_cache_20240624_*/1/**/* | wc
504 2016 55238
Briefly discussed with @khaledk2 @dominikl @jburel @pwalczysko, to reduce the number of possible collisions during the (heavy) memo file regeneration process, one option would be to stop all services that might be concurrently access the database i.e. omero-web
, prometheus-omero-exporter
and the micro-service. Once the cache files have been regenerated, these services can be safely restarted
Cancelled all memo generation. After restarting omeroreadonly-4 (master.err as 70 GB of prometheus errors)
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240624_*/1/**/* | wc"; done
omeroreadonly-1
2624 10496 287101
omeroreadonly-2
2634 10536 288491
omeroreadonly-3
2596 10384 284295
omeroreadonly-4
1171 4684 126720
omeroreadwrite
504 2016 55238
[EDIT: all nodes ran the same ids.txt with 6762 plates - so, ~ 2624/6762 less than half complete]
All of the readonly servers have the same ResourceErrors:
/tmp/ngff_cache_20240624_allplates/1/Image:1949093/stdout:fail: Pixels:1949093 Image:1949093 0.7480490207672119 exception ::omero::ResourceError
/tmp/ngff_cache_20240624_allplates/1/Image:38517/stdout:fail: Pixels:38517 Image:38517 6.995946884155273 exception ::omero::ResourceError
/tmp/ngff_cache_20240624_allplates/1/Image:40053/stdout:fail: Pixels:40053 Image:40053 8.229637384414673 exception ::omero::ResourceError
message = Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2015-09/14/20-06-35.749/005B30_S2R.HTD
2024-06-25 10:13:03,452 ERROR [ ome.io.bioformats.BfPixelBuffer] (.Server-11) Failed to instantiate BfPixelsWrapper with /data/OMERO/ManagedRepository/demo_2/2015-09/14/20-06-35.749/005B30_S2R.HTD
2024-06-25 10:13:03,453 ERROR [ ome.io.nio.PixelsService] (.Server-11) Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2015-09/14/20-06-35.749/005B30_S2R.HTD
java.lang.RuntimeException: java.io.FileNotFoundException: /data/OMERO/ManagedRepository/demo_2/2015-09/14/20-06-35.749/005B30_S2R_P24_s2_w4.TIF (No such file or directory)
at ome.io.bioformats.BfPixelBuffer.reader(BfPixelBuffer.java:79)
at ome.io.bioformats.BfPixelBuffer.setSeries(BfPixelBuffer.java:124)
The file at /data/OMERO/ManagedRepository/demo_2/2015-09/14/20-06-35.749/005B30_S2R_P24_s2_w4.TIF
is not found on idr-testing
or idr-next
or idr
itself!
Image 38517
gives ResouceError above on idr-testing
and idr-next
but not on idr.openmicroscopy.org
!?
Actually, for those 2 plates (5th and 7th of idr008), there are NO ...w4.TIF
images and the 4th channel (DAPI) is black (idr.openmicroscopy.org), whereas for all other idr0008 plates, there are 768
images like ...w4.TIF
and DAPI is viewable.
Images from these 2 plates are not viewable in idr-testing, possibly due to a change in Bio-Formats?
Seb: likely due to https://github.com/ome/bioformats/pull/3806
Following steps at https://github.com/IDR/idr-metadata/issues/691#issuecomment-2023174883 to fix idr0070... in a screen as user omero-server. Removed annotations as described.
psql -U omero -d idr -h $DBHOST -c "select fileset from image where format=360 and series=1 and (fileset < 3552156)" >> idr0070_svs_filesetIds.txt
Removed header and footers (4 rows) then...
$ for r in $(cat idr0070_svs_filesetIds.txt); do echo $r | tr -d '[:space:]' >> idr0070_svs_filesetIds.csv && echo "" >> idr0070_svs_filesetIds.csv; done
$ wc idr0070_svs_filesetIds.csv
400 400 3200 idr0070_svs_filesetIds.csv
for r in $(cat idr0070_svs_filesetIds.csv); do omero delete --report Fileset:$r; done
Created idr0070-experimentA-filePaths_svs.tsv
as described and import...
omero import --bulk experimentA/idr0070-experimentA-bulk.yml --file /tmp/idr0070svs.log --errs /tmp/idr0070svs.err
Done: logs checked as described
$ grep "==> Summary" /tmp/idr0070svs.err | wc
402 804 4824
Re-annotate as at https://github.com/IDR/idr-metadata/issues/691#issuecomment-2066854817
[wmoore@test122-omeroreadwrite idr0070-kerwin-hdbr]$ sudo -Es git checkout -f will/label_images_removal
Previous HEAD position was 452ae3b Fix typo
HEAD is now at 4b08e7f Fix annotations.csv for [label image] images
As omero-server
(venv3) bash-5.1$ /opt/omero/server/venv3/bin/python /uod/idr/metadata/idr-utils/scripts/annotate/check_annotations.py Project:1104 idr0070-experimentA-annotation.csv --output /tmp/errors.csv
All images are unique and have annotations.
$ omero delete --report Annotation:25135328
omero.cmd.Delete2 Annotation:25135328 ok
Steps: 6
Elapsed time: 0.515 secs.
Flags: []
Deleted objects
FileAnnotation:25135328
ProjectAnnotationLink:1667
OriginalFile:29676048
$ /opt/omero/server/OMERO.server/bin/omero metadata populate --report --batch 1000 --file idr0070-experimentA-annotation.csv Project:1104
$ /opt/omero/server/OMERO.server/bin/omero metadata populate --context bulkmap --batch 100 --cfg idr0070-experimentA-bulkmap-config.yml Project:1104
Needed to cache-bust with ssh -A idr-testing.openmicroscopy.org -L 9000:localhost:9000
and open all Datasets.
Looking good now - new images view OK and have annotations, e.g.
https://idr-testing.openmicroscopy.org/webclient/?show=dataset-10461
To check that rendering via s3 is working (not relying on goofys mount), tested on omeroreadwrite, unmounting and check that we can view image... Need to stop server to release mount...
sudo service omero-server stop
sudo umount /bia-integrator-data
sudo service omero-server start
Able to view NGFF images (from idr0004) 👍 Re-mounted:
$ which goofys
/usr/bin/goofys
$ sudo goofys --endpoint https://uk1s3.embassy.ebi.ac.uk/ -o allow_other bia-integrator-data /bia-integrator-data
Checking available disk space on all servers ahead of today's testing... No red flags:
for n in $(cat nodes); do echo $n && ssh $n "df -h"; done
omeroreadonly-1
Filesystem Size Used Avail Use% Mounted on
devtmpfs 4.0M 0 4.0M 0% /dev
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 6.3G 772K 6.3G 1% /run
/dev/vda5 79G 9.3G 70G 12% /
/dev/vda2 936M 283M 654M 31% /boot
/dev/vda1 99M 7.1M 92M 8% /boot/efi
10.35.199.251:/fg_bioimage 572T 410T 162T 72% /nfs/bioimage
bia-idr-integration 1.0P 0 1.0P 0% /bia-idr
bia-integrator-data 1.0P 0 1.0P 0% /bia-integrator-data
192.168.2.43:/data/OMERO 1.5T 1.2T 298G 81% /data/OMERO-readonly
192.168.2.43:/data/BioFormatsCache 1.5T 1.2T 298G 81% /data/BioFormatsCache
192.168.2.43:/data/idr-metadata 1.5T 1.2T 298G 81% /data/idr-metadata
tmpfs 3.2G 4.0K 3.2G 1% /run/user/546
tmpfs 3.2G 4.0K 3.2G 1% /run/user/5098
...
Followed instructions at https://github.com/IDR/idr-metadata/issues/684#issuecomment-2076629628 Tested in webclient (triggering memo file generation) and could view last Images from e.g. http://localhost:1080/webclient/?show=plate-2703
Following successful testing of completed NGFF studies this morning, let's resume memo file generation for ALL data... Still outstanding:
omeroreadonly-1 - all_dataset_ids.txt
omeroreadonly-2 - all_plate_ids_1500.txt (first 1500 rows removed)
omeroreadonly-3 - all_plate_ids_3000.txt (first 3000 rows removed)
omeroreadonly-4 - all_plate_ids_4500.txt (first 4500 rows removed)
omeroreadwrite - all_dataset_ids_7000.txt (first 7000 rows removed)
Running (stopping Prometheus first) e.g.
sudo service prometheus-omero-exporter stop
screen -dmS cache parallel --eta -a all_plate_ids_1500.txt --results /tmp/ngff_cache_20240626_allplates/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
Datasets... Split all_dataset_ids.txt
(14910 rows) into 2...
$ wc all_dataset_ids.txt
14909 14909 199954 all_dataset_ids.txt
# remove first 7000 rows
$ sed '1,7000 d' all_dataset_ids.txt > all_dataset_ids_7000.txt
$ wc all_dataset_ids_7000.txt
7909 7909 109641 all_dataset_ids_7000.txt
screen -dmS cache parallel --eta -a all_dataset_ids_7000.txt --results /tmp/ngff_cache_20240626_alldatasets/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
for n in $(cat nodes); do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240626_*/1/**/* | wc"; done
omeroreadonly-1
0 0 0
omeroreadonly-2
35 140 3854
omeroreadonly-3
561 2244 62019
omeroreadonly-4
503 2012 57033
omeroreadwrite
630 2520 70872
After about 1 hour...
$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240626_*/1/**/* | wc"; done
omeroreadonly-1
1042 4168 114296
omeroreadonly-2
59 236 6501. (out of 1500+ plates)
omeroreadonly-3
596 2384 65882. (out of 1500+ plates)
omeroreadonly-4
1462 5848 165797. (out of 1500+ plates)
omeroreadwrite
47650 190600 5352329
Check directories (useful for counting Dataset:ID
dirs for)
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "ls /tmp/ngff_cache_20240626_*/1/ | wc"; done
omeroreadonly-1
268 268 3167
omeroreadonly-2
111 111 1554
omeroreadonly-3
653 653 9142
omeroreadonly-4
1483 1483 22245
omeroreadwrite
468 468 6085
$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ResourceError' /tmp/ngff_cache_20240626_*/1/**/* | wc"; done
omeroreadonly-1
0 0 0
omeroreadonly-2
0 0 0
omeroreadonly-3
0 0 0
omeroreadonly-4
0 0 0
omeroreadwrite
1436 8616 210222
Affecting Datasets from e.g 9101 -> 14734 - All seem to be idr0043: E.g. viewing http://localhost:1080/webclient/img_detail/13004862/ Gives:
Current status: ok:
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240626_*/1/**/* | wc -l"; done
omeroreadonly-1
2141
omeroreadonly-2
192
omeroreadonly-3
754
omeroreadonly-4
1513
omeroreadwrite
397227
ResourceError
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ResourceError' /tmp/ngff_cache_20240626_*/1/**/* | wc -l"; done
omeroreadonly-1
0
omeroreadonly-2
0
omeroreadonly-3
0
omeroreadonly-4
0
omeroreadwrite
8091
3 hours later:
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240626_*/1/**/* | wc -l"; done
omeroreadonly-1
2735 (Images out of 7k Datasets)
omeroreadonly-2
1107. (out of 1500+ plates)
omeroreadonly-3
901. (out of 1500+ plates)
omeroreadonly-4
1516. (out of 1500+ plates)
omeroreadwrite
628416. (Images out of 8k Datasets) NB: idr0043 is 12k Datasets
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ResourceError' /tmp/ngff_cache_20240626_*/1/**/* | wc -l"; done
omeroreadonly-1
0
omeroreadonly-2
1 (known - idr0015 plate)
omeroreadonly-3
0
omeroreadonly-4
0
omeroreadwrite
12436
6:25 am
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240626_*/1/**/* | wc -l"; done
omeroreadonly-1
26409
omeroreadonly-2
1311. (out of 1500+ plates)
omeroreadonly-3
1359. (out of 1500+ plates)
omeroreadonly-4
1600. (out of 1500+ plates)
omeroreadwrite
1296666
10:32 am
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240626_*/1/**/* | wc -l"; done
omeroreadonly-1
28774
omeroreadonly-2
1400
omeroreadonly-3
1359
omeroreadonly-4
1629 (out of 2262 rows in all_plate_ids_4500.txt)
omeroreadwrite
1658777
No progress on omeroreadonly-3
in the last few hours. Checking logs, last 10 directories are empty..
[wmoore@test122-omeroreadonly-3 ~]$ ls -alh /tmp/ngff_cache_20240626_allplates/1/
...
drwxr-xr-x. 2 wmoore wmoore 45 Jun 27 03:36 Image:9020764
drwxr-xr-x. 2 wmoore wmoore 45 Jun 27 03:36 Image:9529001
drwxr-xr-x. 2 wmoore wmoore 45 Jun 27 03:36 Image:9538241
... these empty:
drwxr-xr-x. 2 wmoore wmoore 45 Jun 27 03:36 Image:9544982
drwxr-xr-x. 2 wmoore wmoore 45 Jun 27 03:37 Image:9545590
drwxr-xr-x. 2 wmoore wmoore 45 Jun 27 03:42 Image:9565961
drwxr-xr-x. 2 wmoore wmoore 45 Jun 27 03:42 Image:9568592
drwxr-xr-x. 2 wmoore wmoore 45 Jun 27 03:42 Image:9621401
drwxr-xr-x. 2 wmoore wmoore 45 Jun 27 03:42 Image:9632921
drwxr-xr-x. 2 wmoore wmoore 45 Jun 27 03:42 Image:9634488
drwxr-xr-x. 2 wmoore wmoore 45 Jun 27 03:51 Image:9636808
drwxr-xr-x. 2 wmoore wmoore 45 Jun 27 04:02 Image:9667170
drwxr-xr-x. 2 wmoore wmoore 45 Jun 27 04:02 Image:9669446
Last Image:9538241 to be processed took half an hour (1542 secs) to process:
[wmoore@test122-omeroreadonly-3 ~]$ cat /tmp/ngff_cache_20240626_allplates/1/Image\:9538241/stdout
ok: Pixels:9538241 Image:9538241 1542.7414109706879
Checking logs for one of those 10 images: https://idr.openmicroscopy.org/webclient/?show=image-9544982 finds only:
[wmoore@test122-omeroreadonly-3 ~]$ grep "2020-01/24/11-38-52.433/007015009.flex" /opt/omero/server/OMERO.server/var/log/Blitz-0.log
2024-06-27 03:55:19,390 WARN [ loci.formats.FormatHandler] (.Server-27) parsing /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-13/2020-01/24/11-38-52.433/007015009.flex
Blitz log is still active (server not dead)!
Checking logs against input ids... e.g.
[wmoore@test122-omeroreadonly-2 ~]$ for i in $(cat all_plate_ids_1500.txt); do echo $i && grep $i /tmp/ngff_cache_20240626_allplates/1/**/*; done;
Image:1376659
/tmp/ngff_cache_20240626_allplates/1/Image:1376659/stdout:ok: Pixels:1376659 Image:1376659 92.37646412849426
Image:1377043
/tmp/ngff_cache_20240626_allplates/1/Image:1377043/stdout:ok: Pixels:1377043 Image:1377043 68.95386505126953
Image:1377427
/tmp/ngff_cache_20240626_allplates/1/Image:1377427/stdout:ok: Pixels:1377427 Image:1377427 68.418936252594
Image:1377811
/tmp/ngff_cache_20240626_allplates/1/Image:1377811/stdout:ok: Pixels:1377811 Image:1377811 81.83312964439392
...
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240626_*/1/**/* | wc -l"; done
omeroreadonly-1
29777
omeroreadonly-2
1609 Done
omeroreadonly-3
1359 Still paused!
omeroreadonly-4
1636. (out of 2262 rows in all_plate_ids_4500.txt)
omeroreadwrite
1751341
Directory counts:
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "ls /tmp/ngff_cache_20240626_*/1/ | wc -l"; done
omeroreadonly-1
529. (out of 7k Datasets!)
omeroreadonly-2
2165
omeroreadonly-3
1369
omeroreadonly-4
1658
omeroreadwrite
4522. (out of 7.9k Datasets!)
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_20240626_*/1/**/* | wc -l"; done
omeroreadonly-1
30788
omeroreadonly-2
2857
omeroreadonly-3
1359. (all_plate_ids_3000.txt)
omeroreadonly-4
1667. (out of 2262 rows in all_plate_ids_4500.txt)
omeroreadwrite
1912051
omeroreadonly-4
has done 1667/2262 rows (end of all_plate_ids_4500.txt), so let's use omeroreadonly-2
to process the last 262 rows.
Remove 6500 rows from all_plate_ids.txt
$ sed '1,6500 d' all_plate_ids.txt > all_plate_ids_6500.txt
$ wc all_plate_ids_6500.txt
262 262 3930 all_plate_ids_6500.txt
[wmoore@test122-omeroreadonly-2 ~]$ screen -dmS cache parallel --eta -a all_plate_ids_6500.txt --results /tmp/ngff_cache_20240627_allplates/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
omeroreadonly-3
has been stuck at the same point for many hours now. No errors. Going to stop, restart server and restart, removing first 1300 rows from all_plate_ids_3000.txt
[wmoore@test122-omeroreadonly-3 ~]$ sudo service omero-server restart
[wmoore@test122-omeroreadonly-3 ~]$ sed '1,1300 d' all_plate_ids_3000.txt > all_plate_ids_4300.txt
[wmoore@test122-omeroreadonly-3 ~]$ screen -dmS cache parallel --eta -a all_plate_ids_4300.txt --results /tmp/ngff_cache_20240627_allplates/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
Looks like omeroreadonly-3
got stuck at the same point as before... (after 1359 ids before, now starting with 1300 ids removed we got stuck at 59)!!:
[wmoore@test122-omeroreadonly-3 ~]$ grep 'ok:' /tmp/ngff_cache_20240627_allplates/1/**/* | wc -l
59
Took the next 10 Image IDs from above:
drwxr-xr-x. 2 wmoore wmoore 45 Jun 27 03:36 Image:9544982
drwxr-xr-x. 2 wmoore wmoore 45 Jun 27 03:37 Image:9545590
drwxr-xr-x. 2 wmoore wmoore 45 Jun 27 03:42 Image:9565961
drwxr-xr-x. 2 wmoore wmoore 45 Jun 27 03:42 Image:9568592
drwxr-xr-x. 2 wmoore wmoore 45 Jun 27 03:42 Image:9621401
drwxr-xr-x. 2 wmoore wmoore 45 Jun 27 03:42 Image:9632921
drwxr-xr-x. 2 wmoore wmoore 45 Jun 27 03:42 Image:9634488
drwxr-xr-x. 2 wmoore wmoore 45 Jun 27 03:51 Image:9636808
drwxr-xr-x. 2 wmoore wmoore 45 Jun 27 04:02 Image:9667170
drwxr-xr-x. 2 wmoore wmoore 45 Jun 27 04:02 Image:9669446
And viewed them all in webclient (connected to omeroreadwrite), e.g. - took about a minute for each to render (memo) http://localhost:1080/webclient/?show=image-9669446
Now let's run same again on omeroreadonly-3
and see if we get further...
$ mv /tmp/ngff_cache_20240627_allplates/ /tmp/ngff_cache_stuck_20240627_allplates/
[wmoore@test122-omeroreadonly-3 ~]$ screen -dmS cache parallel --eta -a all_plate_ids_4300.txt --results /tmp/ngff_cache_20240627_allplates/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
[wmoore@test122-omeroreadonly-3 ~]$ grep 'ok:' /tmp/ngff_cache_20240627_allplates/1/**/* | wc -l
59
[wmoore@test122-omeroreadonly-3 ~]$ grep 'ok:' /tmp/ngff_cache_20240627_allplates/1/**/* | wc -l
79
Seems stuck again.... but not quite
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_2024062*/1/**/* | wc -l"; done
omeroreadonly-1
36688
omeroreadonly-2
18. (out of 262 ids in all_plate_ids_6500.txt)
omeroreadonly-3
82. (out of 200 ids in all_plate_ids_4300.txt)
omeroreadonly-4
1704. (out of 2000 ids: 4500->6500)
omeroreadwrite
2104834
Nearly 2 hours later...
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_2024062*/1/**/* | wc -l"; done
omeroreadonly-1
40003 - now stopped - switch to last 100 plates of all_plate_ids_6500.txt
omeroreadonly-2
20
omeroreadonly-3
88
omeroreadonly-4
1710
omeroreadwrite
2193689
Only a few more plates processed. We need ~100 - ~300 more processed on each of omeroreadonly-2
, omeroreadonly-3
, omeroreadonly-4
. Not going to be done in time for tomorrow morning (12 hours).
Need to stop processing of Datasets on omeroreadonly-1
and omeroreadwrite
since these are less critical (Images are still viewable etc without memo files) and switch to Plates on these...
Last 100 rows
[wmoore@test122-omeroreadonly-1 ~]$ wc all_plate_ids.txt
6762 6762 95866 all_plate_ids.txt
[wmoore@test122-omeroreadonly-1 ~]$ sed '1,6662 d' all_plate_ids.txt > all_plate_ids_6762.txt
[wmoore@test122-omeroreadonly-1 ~]$ wc -l all_plate_ids_6762.txt
100 all_plate_ids_6762.txt
[wmoore@test122-omeroreadonly-1 ~]$ screen -dmS cache parallel --eta -a all_plate_ids_6762.txt --results /tmp/ngff_cache_20240627_allplates/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
rows 6350...
$ wc all_plate_ids.txt
6762 6762 95866 all_plate_ids.txt
$ sed '1,6350 d' all_plate_ids.txt > all_plate_ids_6350.txt
[wmoore@test122-omeroreadwrite ~]$ screen -dmS cache parallel --eta -a all_plate_ids_6350.txt --results /tmp/ngff_cache_20240627_allplates/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_2024062*/1/**/* | wc -l"; done
omeroreadonly-1
78. (out of 100 all_plate_ids_6762.txt)
omeroreadonly-2
194. (out of 262 ids in all_plate_ids_6500.txt)
omeroreadonly-3
96. (all_plate_ids_4300.txt)
omeroreadonly-4
1716 (all_plate_ids_4500.txt)
omeroreadwrite
412. (Done! all_plate_ids_6350.txt) - faster than others! - restart... all_plate_ids_4000.txt
Restart
[wmoore@test122-omeroreadwrite ~]$ screen -dmS cache parallel --eta -a all_plate_ids_4000.txt --results /tmp/ngff_cache_20240627_plates_4000/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
Overnight: looking good... All screen
processes were complete except for omeroreadonly-3
. Cancelled it ahead of testing...
[wmoore@test122-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'ok:' /tmp/ngff_cache_2024062*/1/**/* | wc -l"; done
omeroreadonly-1
98
omeroreadonly-2
262
omeroreadonly-3
2455
omeroreadonly-4
2261
omeroreadwrite
3174
Yesterday, started deleting the contents of
cd /data/OMERO/BioFormatsCache/data_to_delete
created above (old memo files). Ran in a screen, deleting parts of data_to_delete/OMERO/ManagedRepository/demo_2
at a time, e.g.
[wmoore@test122-omeroreadwrite demo_2]$ sudo rm -rf Blitz-0-Ice.ThreadPool.Server-2*
Testing today across ALL data in idr-testing...
Eventually we ran into slow-down, and finally Bad Gateway
in web, with errors on omeroreadonly-4
Blitz log:
2024-06-28 10:56:04,769 WARN [ o.services.db.SelfCorrectingDataSource] (.Server-47) Registering error with list: Current size: 82
2024-06-28 10:56:04,769 ERROR [ o.services.db.SelfCorrectingDataSource] (.Server-47) Failed to acquire connection after retries=3
java.sql.SQLException: unable to get a connection from pool of a PoolingDataSource containing an XAPool of resource c094dc5d-80a5-41c5-9a07-8f9ec64a091a with 25 connection(s) (0 still available)
at bitronix.tm.resource.jdbc.PoolingDataSource.getConnection(PoolingDataSource.java:332)
at org.springframework.jdbc.datasource.DelegatingDataSource.getConnection(DelegatingDataSource.java:86)
at ome.services.db.SelfCorrectingDataSource.call(SelfCorrectingDataSource.java:124)
org.springframework.jdbc.datasource.DataSourceTransactionManager.doBegin(DataSourceTransactionManager.java:246)
Initially, some images took a long time to load, possibly due to lack of memo files (didn't complete for Dataset Images): e.g. https://idr-testing.openmicroscopy.org/webclient/img_detail/12557109/?dataset=13701 and also idr0009 which should have memo files...
Seeing lots of these blocks repeating just now (omeroreadonly-4):
2024-06-28 11:14:48,626 INFO [ loci.formats.ImageReader] (.Server-41) TiffDelegateReader initializing /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-8/2021-02/23/17-10-04.263/TP_T309_Chgreen_Ill0_Ang0,1,2.tif
2024-06-28 11:14:48,626 INFO [ loci.formats.ImageReader] (.Server-40) TiffDelegateReader initializing /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-8/2021-02/23/17-10-04.263/TP_T309_Chgreen_Ill0_Ang0,1,2.tif
2024-06-28 11:14:48,627 INFO [ loci.formats.in.MinimalTiffReader] (.Server-41) Reading IFDs
2024-06-28 11:14:48,627 INFO [ loci.formats.in.MinimalTiffReader] (.Server-40) Reading IFDs
2024-06-28 11:14:48,628 INFO [ loci.formats.in.MinimalTiffReader] (.Server-41) Populating metadata
2024-06-28 11:14:48,629 INFO [ loci.formats.in.MinimalTiffReader] (.Server-40) Populating metadata
2024-06-28 11:14:48,629 INFO [ loci.formats.in.TiffReader] (.Server-41) Checking comment style
2024-06-28 11:14:48,630 INFO [ loci.formats.in.TiffReader] (.Server-40) Checking comment style
2024-06-28 11:14:48,634 INFO [ loci.formats.in.BaseTiffReader] (.Server-41) Populating OME metadata
2024-06-28 11:14:48,637 INFO [ loci.formats.in.BaseTiffReader] (.Server-40) Populating OME metadata
Looking at memo files created during testing window - > 9:30 BST is > 8:30 GMT found 56 entries in omeroreadonly-4
:
[wmoore@test122-omeroreadonly-4 ~]$ grep "saved memo" /opt/omero/server/OMERO.server/var/log/Blitz-0.log
...
2024-06-28 08:56:17,251 DEBUG [ loci.formats.Memoizer] (l.Server-8) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2015-09/23/13-06-15.559/.experiment_descriptor.xml.bfmemo (452167 bytes)
2024-06-28 08:56:39,546 DEBUG [ loci.formats.Memoizer] (l.Server-5) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2015-09/23/12-26-19.536/.experiment_descriptor.xml.bfmemo (531206 bytes)
2024-06-28 08:57:10,473 DEBUG [ loci.formats.Memoizer] (.Server-12) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2015-12/08/09-05-47.246/.experiment_descriptor.xml.bfmemo (451823 bytes)
2024-06-28 08:57:10,473 DEBUG [ loci.formats.Memoizer] (l.Server-4) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2015-12/08/09-05-47.246/.experiment_descriptor.xml.bfmemo (451823 bytes)
2024-06-28 08:58:28,668 DEBUG [ loci.formats.Memoizer] (.Server-20) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2015-09/23/11-38-36.949/.experiment_descriptor.xml.bfmemo (452167 bytes)
2024-06-28 08:59:09,932 DEBUG [ loci.formats.Memoizer] (.Server-12) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2015-09/23/11-14-58.128/.experiment_descriptor.xml.bfmemo (452167 bytes)
2024-06-28 08:59:23,801 DEBUG [ loci.formats.Memoizer] (.Server-11) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2015-11/26/07-35-04.628/.experiment_descriptor.xml.bfmemo (451613 bytes)
2024-06-28 08:59:23,981 DEBUG [ loci.formats.Memoizer] (l.Server-3) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2015-11/26/07-35-04.628/.experiment_descriptor.xml.bfmemo (451613 bytes)
2024-06-28 09:05:05,583 DEBUG [ loci.formats.Memoizer] (.Server-10) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-04/05/20-45-58.507/.img_000000001_07-YFPtriple_000.tif.bfmemo (1288707 bytes)
2024-06-28 09:05:57,129 DEBUG [ loci.formats.Memoizer] (.Server-13) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-04/05/10-26-47.150/.img_000000001_07-YFPtriple_000.tif.bfmemo (1286993 bytes)
2024-06-28 09:06:22,096 DEBUG [ loci.formats.Memoizer] (.Server-19) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-04/05/10-47-47.606/.img_000000001_07-YFPtriple_000.tif.bfmemo (1287203 bytes)
2024-06-28 09:06:34,361 DEBUG [ loci.formats.Memoizer] (l.Server-4) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-04/05/10-24-27.224/.img_000000001_07-YFPtriple_000.tif.bfmemo (1288135 bytes)
2024-06-28 09:06:59,275 DEBUG [ loci.formats.Memoizer] (.Server-11) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-04/05/10-25-15.670/.img_000000001_07-YFPtriple_000.tif.bfmemo (1286930 bytes)
2024-06-28 09:06:59,843 DEBUG [ loci.formats.Memoizer] (.Server-10) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-04/05/10-32-48.388/.img_000000001_07-YFPtriple_000.tif.bfmemo (1287339 bytes)
2024-06-28 09:07:02,187 DEBUG [ loci.formats.Memoizer] (l.Server-8) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-04/05/10-32-03.591/.img_000000001_07-YFPtriple_000.tif.bfmemo (1287303 bytes)
2024-06-28 09:07:08,730 DEBUG [ loci.formats.Memoizer] (l.Server-6) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-04/05/10-33-33.721/.img_000000001_07-YFPtriple_000.tif.bfmemo (1287270 bytes)
2024-06-28 09:08:12,530 DEBUG [ loci.formats.Memoizer] (.Server-12) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-04/05/10-24-27.224/.img_000000001_07-YFPtriple_000.tif.bfmemo (1288135 bytes)
2024-06-28 09:09:06,474 DEBUG [ loci.formats.Memoizer] (.Server-10) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-04/04/17-51-26.642/.TR1_2_W0001_P0009_T0015.tif.bfmemo (53568 bytes)
2024-06-28 09:12:53,373 DEBUG [ loci.formats.Memoizer] (.Server-11) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-19/2018-11/26/09-57-43.807/Mmu_E1_CAGTAG1.TM000036_timeFused_blending/.SPM00_TM000036_CM00_CM01_CHN01.fusedStack_yzProjection.corrected.shifted.klb.bfmemo (945972 bytes)
2024-06-28 09:13:32,400 DEBUG [ loci.formats.Memoizer] (.Server-16) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-20/2018-11/26/10-02-40.771/Mmu_E1_CAGTAG1.TM000000_timeFused_blending/.SPM00_TM000000_CM00_CM01_CHN00.fusedStack_xyProjection.corrected.shifted.klb.bfmemo (281608 bytes)
2024-06-28 09:13:51,546 DEBUG [ loci.formats.Memoizer] (l.Server-8) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-20/2018-11/26/10-05-14.978/Mmu_E1_GFPLifeactxmKate2nls.TM000000_timeFused_blending/.SPM00_TM000000_CM00_CM01_CHN01.fusedStack.corrected.shifted.klb.bfmemo (278165 bytes)
2024-06-28 09:14:55,666 DEBUG [ loci.formats.Memoizer] (l.Server-9) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-12-34.562/.Embryo6.pattern.bfmemo (2652729 bytes)
2024-06-28 09:15:32,742 DEBUG [ loci.formats.Memoizer] (l.Server-0) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-58-04.234/.NocMo.2_Cell.pattern.bfmemo (1248192 bytes)
2024-06-28 09:16:04,436 DEBUG [ loci.formats.Memoizer] (l.Server-6) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-35-22.064/.MoNoc.Cell10.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:04,590 DEBUG [ loci.formats.Memoizer] (.Server-13) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-37-16.341/.MoNoc.Cell01.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:04,612 DEBUG [ loci.formats.Memoizer] (.Server-15) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-35-55.369/.NocMo.Cell04.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:04,752 DEBUG [ loci.formats.Memoizer] (.Server-11) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-37-14.233/.MoNoc.Cell03.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:04,752 DEBUG [ loci.formats.Memoizer] (l.Server-4) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-37-07.946/.MoNoc.Cell04.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:04,752 DEBUG [ loci.formats.Memoizer] (l.Server-1) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-37-10.198/.MoNoc.Cell05.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:04,990 DEBUG [ loci.formats.Memoizer] (.Server-22) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-36-14.012/.NocMo.Cell01.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:04,992 DEBUG [ loci.formats.Memoizer] (l.Server-9) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-37-12.169/.MoNoc.Cell02.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:04,993 DEBUG [ loci.formats.Memoizer] (.Server-17) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-36-24.573/.MoNoc.Cell16.pattern.bfmemo (140049 bytes)
2024-06-28 09:16:05,013 DEBUG [ loci.formats.Memoizer] (.Server-21) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-35-22.064/.MoNoc.Cell10.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:05,215 DEBUG [ loci.formats.Memoizer] (.Server-25) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-37-12.169/.MoNoc.Cell02.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:05,814 DEBUG [ loci.formats.Memoizer] (l.Server-7) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-36-20.449/.NocMo.Cell02.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:05,871 DEBUG [ loci.formats.Memoizer] (.Server-10) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-37-05.845/.MoNoc.Cell07.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:05,871 DEBUG [ loci.formats.Memoizer] (.Server-12) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-36-32.902/.MoNoc.Cell12.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:05,969 DEBUG [ loci.formats.Memoizer] (l.Server-2) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-37-03.806/.MoNoc.Cell06.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:05,989 DEBUG [ loci.formats.Memoizer] (l.Server-0) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-37-01.809/.MoNoc.Cell09.pattern.bfmemo (140058 bytes)
2024-06-28 09:16:06,067 DEBUG [ loci.formats.Memoizer] (l.Server-8) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-36-28.736/.MoNoc.Cell14.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:06,068 DEBUG [ loci.formats.Memoizer] (.Server-18) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-36-22.555/.MoNoc.Cell15.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:06,069 DEBUG [ loci.formats.Memoizer] (.Server-16) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-36-26.527/.MoNoc.Cell13.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:06,070 DEBUG [ loci.formats.Memoizer] (l.Server-5) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-36-30.782/.MoNoc.Cell11.pattern.bfmemo (140059 bytes)
2024-06-28 09:16:06,111 DEBUG [ loci.formats.Memoizer] (l.Server-3) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-36-57.750/.MoNoc.Cell08.pattern.bfmemo (140058 bytes)
2024-06-28 09:16:06,127 DEBUG [ loci.formats.Memoizer] (.Server-19) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-08/23/10-36-18.392/.NocMo.Cell03.pattern.bfmemo (140059 bytes)
2024-06-28 09:20:28,739 DEBUG [ loci.formats.Memoizer] (.Server-18) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-12/2023-08/11/14-40-46.960/.230316_2108_ctrl_2h_smc4_dep_rep2_hemi_stlc_60min_zoom5-01-46.czi #2.tif_registered.tif.bfmemo (148816 bytes)
2024-06-28 09:20:33,959 DEBUG [ loci.formats.Memoizer] (.Server-14) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-14/2023-08/11/14-21-06.678/.230320_2108_compensation_wapl_smc4_sororin_dep_stlc_rep1_hemi-02-25.tif_registered.tif.bfmemo (146968 bytes)
2024-06-28 11:09:22,655 DEBUG [ loci.formats.Memoizer] (.Server-27) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/23/16-59-03.757/.img_TL.companion.ome.bfmemo (13192454 bytes)
2024-06-28 11:11:26,507 DEBUG [ loci.formats.Memoizer] (.Server-23) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/23/16-59-03.757/.img_TL.companion.ome.bfmemo (13192454 bytes)
2024-06-28 11:11:37,988 DEBUG [ loci.formats.Memoizer] (.Server-28) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/23/16-59-03.757/.img_TL.companion.ome.bfmemo (13192454 bytes)
2024-06-28 11:12:04,759 DEBUG [ loci.formats.Memoizer] (.Server-30) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/23/16-59-03.757/.img_TL.companion.ome.bfmemo (13192454 bytes)
2024-06-28 11:13:09,255 DEBUG [ loci.formats.Memoizer] (l.Server-6) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/23/16-59-03.757/.img_TL.companion.ome.bfmemo (13192454 bytes)
2024-06-28 11:14:13,382 DEBUG [ loci.formats.Memoizer] (.Server-18) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/23/16-59-03.757/.img_TL.companion.ome.bfmemo (13192454 bytes)
2024-06-28 11:15:09,895 DEBUG [ loci.formats.Memoizer] (.Server-32) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/23/16-59-03.757/.img_TL.companion.ome.bfmemo (13192454 bytes)
2024-06-28 11:15:28,375 DEBUG [ loci.formats.Memoizer] (.Server-12) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/23/16-59-03.757/.img_TL.companion.ome.bfmemo (13192454 bytes)
img_TL.companion.ome.bfmemo
corresponds to Image:12557109 above (idr0099) which took a long time to load.
Checking $ grep -B 1 "saved memo" /opt/omero/server/OMERO.server/var/log/Blitz-0.log
shows typical times like:
2024-06-28 11:15:28,288 DEBUG [ loci.formats.Memoizer] (.Server-12) start[1719571343967] time[1984321] tag[loci.formats.Memoizer.saveMemo]
1984321 ms
is half an hour (33 mins).
Similar story on omeroreadonly-3
, with a bunch of long-running memo generations:
[wmoore@test122-omeroreadonly-3 ~]$ grep "loci.formats.Memoizer.saveMemo" /opt/omero/server/OMERO.server/var/log/Blitz-0.log
2024-06-28 10:38:56,130 DEBUG [ loci.formats.Memoizer] (.Server-12) start[1719571132162] time[3968] tag[loci.formats.Memoizer.saveMemo]
2024-06-28 10:44:09,474 DEBUG [ loci.formats.Memoizer] (l.Server-0) start[1719571390887] time[58586] tag[loci.formats.Memoizer.saveMemo]
2024-06-28 10:48:11,801 DEBUG [ loci.formats.Memoizer] (.Server-13) start[1719571681990] time[9811] tag[loci.formats.Memoizer.saveMemo]
2024-06-28 11:01:09,652 DEBUG [ loci.formats.Memoizer] (.Server-34) start[1719572233552] time[236100] tag[loci.formats.Memoizer.saveMemo]
2024-06-28 11:06:15,122 DEBUG [ loci.formats.Memoizer] (l.Server-3) start[1719570923843] time[1851278] tag[loci.formats.Memoizer.saveMemo]
2024-06-28 11:09:42,979 DEBUG [ loci.formats.Memoizer] (l.Server-5) start[1719571104120] time[1878859] tag[loci.formats.Memoizer.saveMemo]
2024-06-28 11:16:58,931 DEBUG [ loci.formats.Memoizer] (l.Server-4) start[1719572173760] time[1245171] tag[loci.formats.Memoizer.saveMemo]
2024-06-28 11:18:55,829 DEBUG [ loci.formats.Memoizer] (l.Server-6) start[1719572468542] time[1067286] tag[loci.formats.Memoizer.saveMemo]
2024-06-28 11:24:15,515 DEBUG [ loci.formats.Memoizer] (.Server-11) start[1719573343504] time[512010] tag[loci.formats.Memoizer.saveMemo]
2024-06-28 11:24:16,433 DEBUG [ loci.formats.Memoizer] (.Server-10) start[1719573343504] time[512929] tag[loci.formats.Memoizer.saveMemo]
e.g. 1067286 ms
is 17 minutes - (for same image as above):
2024-06-28 11:18:55,829 DEBUG [ loci.formats.Memoizer] (l.Server-6) start[1719572468542] time[1067286] tag[loci.formats.Memoizer.saveMemo]
2024-06-28 11:18:56,761 DEBUG [ loci.formats.Memoizer] (l.Server-6) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/23/16-59-03.757/.img_TL.companion.ome.bfmemo (13192454 bytes)
and a few other images too:
[wmoore@test122-omeroreadonly-3 ~]$ grep "saved memo" /opt/omero/server/OMERO.server/var/log/Blitz-0.log
2024-06-28 10:38:56,460 DEBUG [ loci.formats.Memoizer] (.Server-12) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-16/2024-02/28/14-27-02.476/.position_18.tif.bfmemo (38861 bytes)
2024-06-28 10:44:12,942 DEBUG [ loci.formats.Memoizer] (l.Server-0) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-94/2019-09/17/08-31-42.409/.ExpB_whole_cns_AF.ome.tif.bfmemo (411891 bytes)
2024-06-28 10:48:12,591 DEBUG [ loci.formats.Memoizer] (.Server-13) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-102/2019-09/17/08-38-35.243/.ExpC_TPH2_left_cerebellum_MIP_slice_600_to_800.tif.bfmemo (34991 bytes)
2024-06-28 11:01:13,148 DEBUG [ loci.formats.Memoizer] (.Server-34) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-109/2019-09/17/08-39-19.075/.ExpD_chicken_embryo_stitched.ome.tif.bfmemo (1363919 bytes)
2024-06-28 11:06:16,760 DEBUG [ loci.formats.Memoizer] (l.Server-3) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/23/16-59-03.757/.img_TL.companion.ome.bfmemo (13192454 bytes)
2024-06-28 11:09:44,114 DEBUG [ loci.formats.Memoizer] (l.Server-5) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/23/16-59-03.757/.img_TL.companion.ome.bfmemo (13192454 bytes)
2024-06-28 11:17:00,086 DEBUG [ loci.formats.Memoizer] (l.Server-4) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/23/16-59-03.757/.img_TL.companion.ome.bfmemo (13192454 bytes)
2024-06-28 11:18:56,761 DEBUG [ loci.formats.Memoizer] (l.Server-6) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/23/16-59-03.757/.img_TL.companion.ome.bfmemo (13192454 bytes)
2024-06-28 11:24:15,659 DEBUG [ loci.formats.Memoizer] (.Server-11) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-8/2021-02/23/17-10-04.263/.TP_Chgreen_Ill0_Ang0,1,2.pattern.bfmemo (11983275 bytes)
2024-06-28 11:24:16,612 DEBUG [ loci.formats.Memoizer] (.Server-10) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-8/2021-02/23/17-10-04.263/.TP_Chgreen_Ill0_Ang0,1,2.pattern.bfmemo (11983275 bytes)
Many error messages like this from the NGINX error log file from different nodes
2024/06/28 11:08:26 [error] 1183#1183: *107253 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.2.190, server: test122-omeroreadonly-2.novalocal, request: "GET /webgateway/get_thumbnails/?id=8849964&id=8849508&id=8848993&id=8848857&id=8849556&id=8849157&id=8849484&id=8849596&id=8848937&id=8850072&id=8849153&id=8848901&id=8849029&id=8848737&id=8849716&id=8850000&id=8848881&id=8850104&id=8849141&id=8850040&id=8850236&id=8849640&id=8849365&id=8849856&id=8849868&id=8848849&id=8849213&id=8849800&id=8849676&id=8849297&id=8849728&id=8848797&id=8850208&id=8849193&callback=jQuery362017553399812563741719572434117&=1719572434118 HTTP/1.0", upstream: "http://127.0.0.1:4080/webgateway/get_thumbnails/?id=8849964&id=8849508&id=8848993&id=8848857&id=8849556&id=8849157&id=8849484&id=8849596&id=8848937&id=8850072&id=8849153&id=8848901&id=8849029&id=8848737&id=8849716&id=8850000&id=8848881&id=8850104&id=8849141&id=8850040&id=8850236&id=8849640&id=8849365&id=8849856&id=8849868&id=8848849&id=8849213&id=8849800&id=8849676&id=8849297&id=8849728&id=8848797&id=8850208&id=8849193&callback=jQuery36201755339981256374_1719572434117&_=1719572434118", host: "idr-testing.openmicroscopy.org", referrer: "http://idr-testing.openmicroscopy.org/webclient/?show=project-1701"
Need to generate memo files for Dataset Images, excluding idr0043...
On omeroreadwrite
...
/opt/omero/server/OMERO.server/bin/omero hql --limit -1 --ids-only --style csv 'select d.id from Dataset d where d.id in (select link.child.id from ProjectDatasetLink as link where link.parent.id!=501)' > datasets_not_idr0043.txt
cut -d ',' -f2 datasets_not_idr0043.txt | sed -e 's/^/Dataset:/' > datasets_not_idr0043_ids.txt
[wmoore@test122-omeroreadwrite ~]$ wc -l datasets_not_idr0043_ids.txt
2909 datasets_not_idr0043_ids.txt
[wmoore@test122-omeroreadwrite ~]$ screen -dmS cache parallel --eta -a datasets_not_idr0043_ids.txt --results /tmp/ngff_cache_20240628_ds_not_idr0043/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
After ~15 mins.. most images have memos so far....
[wmoore@test122-omeroreadwrite ~]$ grep "ok:" /tmp/ngff_cache_20240628_ds_not_idr0043/1/**/*
...
/tmp/ngff_cache_20240628_ds_not_idr0043/1/Dataset:99/stdout:ok: Pixels:1886003 Image:1886003 0.22210121154785156
/tmp/ngff_cache_20240628_ds_not_idr0043/1/Dataset:99/stdout:ok: Pixels:1886004 Image:1886004 0.121246337890625
/tmp/ngff_cache_20240628_ds_not_idr0043/1/Dataset:99/stdout:ok: Pixels:1885997 Image:1885997 0.07439756393432617
/tmp/ngff_cache_20240628_ds_not_idr0043/1/Dataset:99/stdout:ok: Pixels:1885998 Image:1885998 3.472053289413452
[wmoore@test122-omeroreadwrite ~]$ grep "ok:" /tmp/ngff_cache_20240628_ds_not_idr0043/1/**/* | wc
5906 23624 674103
# count Datasets... 2909 in total...
[wmoore@test122-omeroreadwrite ~]$ ls /tmp/ngff_cache_20240628_ds_not_idr0043/1/ | wc
486 486 5783
Also need to regenerate memo files for idr0009 - reported slow (missing) during testing and seen as experiment_descriptor.xml.bfmemo
in the testing "memo saved" logs above. Most of the non-NGFF memo generation happened after idr0009 update, but not all, so these got missed...
omeroreadonly-1
...
/opt/omero/server/OMERO.server/bin/omero hql --limit -1 --ids-only --style csv 'select MIN(field.image.id) FROM WellSample AS field where field.well.plate.id in (select link.child.id from ScreenPlateLink as link where link.parent.id=251) GROUP BY field.well.plate' > idr0009_plates.txt
cut -d ',' -f2 idr0009_plates.txt | sed -e 's/^/Image:/' > idr0009_plate_ids.txt
[wmoore@test122-omeroreadonly-1 ~]$ screen -dmS cache parallel --eta -a idr0009_plate_ids.txt --results /tmp/ngff_cache_20240628_idr0009/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
Steps needed on idr-next for NGFF upgrade. NB: current checklist is for actions on idr-testing (newly redeployed on 21st May 2024)
Detailed workflow is at https://github.com/IDR/mkngff_upgrade_scripts but this is an outline, also includes study-specific jobs:
Manual Software updates (should be part of the original deployment for idr-next):
NGFF and other udpates:
$SECRET
, update all sql commands with it and run them (see https://github.com/IDR/mkngff_upgrade_scripts)mkngff symlink
on all studies, including bfoptions creationPlate 130-16
, by going to an Image (e.g. http://localhost:1080/webclient/?show=image-3063425 and "Save To All"idr00XX.csv
andsql
files from https://github.com/IDR/idr-utils/pull/56 into separate repos, then review/merge the PR.