TLDR: ERROR in logs seem to be coming from memo file generation (which was cancelled but still running during web testing), rather than from web testing itself....
Start by scanning logs on all 5 servers for ERROR statements from 9am today:
NB: logs are timestamped with GMT (not BST), so we need to grep for timestamps for an hour earlier '2024-04-10 08...'
less /data/OMERO/ManagedRepository/demo_2/2016-07/28/18-54-45.119_mkngff/a78bd2cc-f574-47d9-ae83-e3df322efdda.zarr/OME/METADATA.ome.xml
...
Name="/uod/idr/filesets/idr0011-thorpe-Dad4/20150826-peter_thorpe/T34 x TS/Plate1-TS/Plate1-TS-Red-B"
Using the same approach we can see...
On readonlyomero-2, the last error is at 2024-04-10 08:21:45,104 (before we started web testing) coming from the memo file saved for idr0013: plate LT0064_25.
On omeroreadwrite the error at 2024-04-10 08:41:10,171 (GMT) during testing (9:40 BST) is "memo file saved" from idr0013: plate LT0065_05.
2024-04-10 08:44:52,401 is from idr0013 plate LT0065_06.
2024-04-10 08:57:04,335 is from idr0013 plate LT0066_02
All these happened during testing, using a different server (omeroreadonly) but the same Database.
Checking an hour later, we see that memo file generation was still ongoing...
Most of those are coming from omeroreadonly-3
First of these is associated with creating PixelBuffer which waited on the completion of memo file generation
less /data/OMERO/ManagedRepository/demo_2/2016-07/28/20-51-59.292_mkngff/f63bc331-42b0-4e55-abd3-abf4de843026.zarr/OME/METADATA.ome.xml
...
Name="/uod/idr/filesets/idr0011-thorpe-Dad4/20150826-peter_thorpe/T34 x TS/Plate3-TS/Plate3-TS-Blue-B"
It looks like all the other ERROR from 09:13:40 were other services that were held-up by that one completing.
Errors at 09:40:41 were the same plate from idr0011:
This is for idr0013, plate Name="LT0068_43" from less /data/OMERO/ManagedRepository/demo_2/2016-05/04/03-27-54.535_mkngff/7493b87d-b6f6-48ce-a19d-3741eb11a57f.zarr/OME/METADATA.ome.xml
Since we weren't looking at idr0013 during testing, this is likely coming from the parallel memo file generation that was only cancelled as testing started around 9:35.
This is confirmed by memo generation logs... (times here are GMT - 1 hour out). At 8:32 (9:32 BST) we see Killed by signal 15. in stderr.
But the memo file generation continued on the server until 09:43:23 (all during testing time).
8133856 ms (memo saved time) is 135 minutes, corresponding to start at 7:27 -> 9:43 (GMT).
TLDR:
ERROR
in logs seem to be coming from memo file generation (which was cancelled but still running during web testing), rather than from web testing itself....Start by scanning logs on all 5 servers for ERROR statements from 9am today: NB: logs are timestamped with GMT (not BST), so we need to grep for timestamps for an hour earlier '2024-04-10 08...'
Since idr-testing web tends to use
omeroreadonly-3
andomeroreadonly-4
lets checkomeroreadonly-3
:from idr0011 - ScreenB:
Using the same approach we can see...
On readonlyomero-2, the last error is at
2024-04-10 08:21:45,104
(before we started web testing) coming from the memo file saved for idr0013: plate LT0064_25.On omeroreadwrite the error at
2024-04-10 08:41:10,171
(GMT) during testing (9:40 BST) is "memo file saved" from idr0013: plate LT0065_05.2024-04-10 08:44:52,401
is from idr0013 plate LT0065_06.2024-04-10 08:57:04,335
is from idr0013 plate LT0066_02All these happened during testing, using a different server (omeroreadonly) but the same Database.
Checking an hour later, we see that memo file generation was still ongoing...
Most of those are coming from omeroreadonly-3 First of these is associated with creating PixelBuffer which waited on the completion of memo file generation
This is from idr0011:
It looks like all the other ERROR from
09:13:40
were other services that were held-up by that one completing.Errors at
09:40:41
were the same plate from idr0011:NB: we have a repeated generation of memo file for this Fileset, saved at the same location but with a difference of 14 bytes in size:
Other ERRORs also associated with memo file generation:
This is for idr0013, plate
Name="LT0068_43"
fromless /data/OMERO/ManagedRepository/demo_2/2016-05/04/03-27-54.535_mkngff/7493b87d-b6f6-48ce-a19d-3741eb11a57f.zarr/OME/METADATA.ome.xml
Since we weren't looking at idr0013 during testing, this is likely coming from the
parallel
memo file generation that was only cancelled as testing started around 9:35.This is confirmed by memo generation logs... (times here are GMT - 1 hour out). At
8:32
(9:32 BST) we seeKilled by signal 15.
in stderr.But the memo file generation continued on the server until
09:43:23
(all during testing time).8133856
ms (memo saved time) is 135 minutes, corresponding to start at 7:27 -> 9:43 (GMT).