IDR / idr-metadata

Curated metadata for all studies published in the Image Data Resource
https://idr.openmicroscopy.org
14 stars 24 forks source link

Bio-Formats cache regeneration walkthrough #686

Open will-moore opened 5 months ago

will-moore commented 5 months ago

Following https://github.com/IDR/deployment/blob/master/docs/operating-procedures.md#bio-formats-cache-regeneration with step by step notes...

The instructions there distribute the work across all 5 proxy servers..

Followed instructions exactly as written on idr-testing until:

for i in $(cat nodes); do ssh $i 'pip install --user omero-cli-render'; done
bash: pip: command not found
bash: pip: command not found

We can probably skip this step if we use the one shared python environment at /opt/omero/server/venv3 for subsequent steps...

[wmoore@test120-omeroreadwrite ~]$ sudo rm -rf /data/BioFormatsCache/*

Checking we have same jars on all servers... e.g. ZarrReader...

for i in $(cat nodes); do ssh $i 'md5sum /opt/omero/server/OMERO.server/lib/server/OMEZarr*'; done
ce6e7e67626f8c5cc59085b0282ba9cc  /opt/omero/server/OMERO.server/lib/server/OMEZarrReader.jar
ce6e7e67626f8c5cc59085b0282ba9cc  /opt/omero/server/OMERO.server/lib/server/OMEZarrReader.jar
ce6e7e67626f8c5cc59085b0282ba9cc  /opt/omero/server/OMERO.server/lib/server/OMEZarrReader.jar
ce6e7e67626f8c5cc59085b0282ba9cc  /opt/omero/server/OMERO.server/lib/server/OMEZarrReader.jar
ce6e7e67626f8c5cc59085b0282ba9cc  /opt/omero/server/OMERO.server/lib/server/OMEZarrReader.jar

Deletion has been running for an hour and a half now:

[wmoore@test120-omeroreadwrite ~]$ sudo rm -rf /data/BioFormatsCache/*
will-moore commented 5 months ago

As suggested, I tried instead to move data initially, then delete later. Left this running in a screen... But it seems this was actually copying data from one mounted location to another?

[wmoore@test120-omeroreadwrite ~]$ sudo mkdir /BFCacheToDelete2
[wmoore@test120-omeroreadwrite ~]$ sudo mv /data/BioFormatsCache/data/OMERO/ManagedRepository/demo_2 /BFCacheToDelete2
...
mv: cannot create directory ‘/BFCacheToDelete2/demo_2/Blitz-0-Ice.ThreadPool.Server-165’: No space left on device
mv: cannot create directory ‘/BFCacheToDelete2/demo_2/Blitz-0-Ice.ThreadPool.Server-199’: No space left on device
mv: cannot create directory ‘/BFCacheToDelete2/demo_2/Blitz-0-Ice.ThreadPool.Server-179’: No space left on device
mv: cannot create directory ‘/BFCacheToDelete2/demo_2/Blitz-0-Ice.ThreadPool.Server-202’: No space left on device
mv: cannot create directory ‘/BFCacheToDelete2/demo_2/Blitz-0-Ice.ThreadPool.Server-180’: No space left on device
mv: cannot create directory ‘/BFCacheToDelete2/demo_2/Blitz-0-Ice.ThreadPool.Server-200’: No space left on device
mv: cannot create directory ‘/BFCacheToDelete2/demo_2/Blitz-0-Ice.ThreadPool.Server-206’: No space left on device

Restarted $ sudo rm -rf /data/BioFormatsCache/* in a screen...

Meanwhile cleanup, $ sudo rm -rf /BFCacheToDelete2

will-moore commented 5 months ago

Seems that parallel wasn't installed on idr-testing...

[wmoore@test120-proxy ~]$ which parallel
/usr/bin/which: no parallel in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/wmoore/.local/bin:/home/wmoore/bin)
[wmoore@test120-proxy ~]$ sudo yum install https://dl.fedoraproject.org/pub/epel/7/x86_64/Packages/p/parallel-20160222-1.el7.noarch.rpm
...
[wmoore@test120-proxy ~]$ which parallel
/usr/local/bin/parallel

Try again..

screen -dmS cache parallel --eta --sshloginfile nodes -a ids.txt --results /tmp/cache/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

screen -r

Computers / CPU threads / Max jobs to run
1:omeroreadonly-1 / 1 / 10
2:omeroreadonly-2 / 1 / 10
3:omeroreadonly-3 / 1 / 10
4:omeroreadonly-4 / 1 / 10
5:omeroreadwrite / 1 / 10

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 21698s Left: 21691 AVG: 2.88s  omeroreadonly-1:10/2/20%/29.0s  omeroreadonly-2:10/5/25%/11.6s  omeroreadonly-3:10/0/17%/0.0s  omeroreadonly-4:10/1/18%/58.0s  omeroreadwrite:10/0/17%/0.0s
will-moore commented 5 months ago

Seems pretty unhappy... Lots of DatabaseBusyException...

[wmoore@test120-proxy ~]$ grep Error /tmp/cache/**/**/* | wc
     91     712   14456

[wmoore@test120-proxy ~]$ grep Error /tmp/cache/**/**/*
...
/tmp/cache/1/Image:608472/stderr:FileNotFoundError: [Errno 2] No such file or directory: path('/home/wmoore/omero/sessions/localhost/public/5e2e34a1-4b10-49e8-a07d-132d164df892')
/tmp/cache/1/Image:608856/stderr:FileNotFoundError: [Errno 2] No such file or directory: path('/home/wmoore/omero/sessions/localhost/public/5e2e34a1-4b10-49e8-a07d-132d164df892')
/tmp/cache/1/Image:610008/stderr:FileNotFoundError: [Errno 2] No such file or directory: path('/home/wmoore/omero/sessions/localhost/public/5e2e34a1-4b10-49e8-a07d-132d164df892')
/tmp/cache/1/Image:63089/stderr:FileExistsError: [Errno 17] File exists: path('/home/wmoore/omero/tmp')
/tmp/cache/1/Image:63089/stderr:FileExistsError: [Errno 17] File exists: path('/tmp/omero/tmp')
/tmp/cache/1/Image:63089/stderr:Error loading: /opt/omero/server/venv3/lib64/python3.6/site-packages/omero/plugins/_metadata_deprecated.py
/tmp/cache/1/Image:667768/stderr:FileNotFoundError: [Errno 2] No such file or directory: path('/home/wmoore/omero/sessions/localhost/public/05ee3d02-7fd0-4ec7-ad69-70692dc7bb7e')
/tmp/cache/1/Image:68465/stderr:FileNotFoundError: [Errno 2] No such file or directory: path('/home/wmoore/omero/sessions/localhost/public/577d172f-af4a-43b6-a5af-2f2ce42beef1')
/tmp/cache/1/Image:69233/stderr:FileNotFoundError: [Errno 2] No such file or directory: path('/home/wmoore/omero/sessions/localhost/public/577d172f-af4a-43b6-a5af-2f2ce42beef1')
/tmp/cache/1/Image:693144/stderr:FileNotFoundError: [Errno 2] No such file or directory: path('/home/wmoore/omero/sessions/localhost/public/97488570-6c46-4eec-8874-12ee239091ff')
...

[wmoore@test120-proxy ~]$ cat /tmp/cache/1/Image:98291/stderrr
cat: /tmp/cache/1/Image:98291/stderrr: No such file or directory
[wmoore@test120-proxy ~]$ cat /tmp/cache/1/Image:98291/stderr
Traceback (most recent call last):
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/plugins/sessions.py", line 622, in attach
    rv = store.attach(server, name, uuid, set_current=set_current)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 349, in attach
    set_current=set_current)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 385, in create
    ec = sf.getAdminService().getEventContext()
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero_api_IAdmin_ice.py", line 2655, in getEventContext
    return _M_omero.api.IAdmin._op_getEventContext.invoke(self, ((), _ctx))
omero.DatabaseBusyException: exception ::omero::DatabaseBusyException
{
    serverStackTrace = ome.conditions.DatabaseBusyException: cannot create transaction

[wmoore@test120-proxy ~]$  grep DatabaseBusyException /tmp/cache/**/**/* | wc
    907   10422  135925
will-moore commented 5 months ago
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 46524s Left: 20435 AVG: 2.30s  omeroreadonly-1:10/284/22%/10.4s  omeroreadonly-2:10/146/11%/20.2s  omeroreadonly-3:10/63/5%/46.7s  omeroreadonly-4:10/141/11%/20.9s  omeroreadwrite:10/630/48%/4.7s 
will-moore commented 5 months ago

Cancelled:

[wmoore@test120-proxy ~]$ screen -X -S 2497.cache quit

Restarting...

for server in omeroreadwrite omeroreadonly-1 omeroreadonly-2 omeroreadonly-3 omeroreadonly-4; do ssh $server "sudo service omero-server restart"; done
will-moore commented 5 months ago

Re-running...

[wmoore@test120-proxy ~]$ rm -rf /tmp/cache/
[wmoore@test120-proxy ~]$ screen -dmS cache parallel --eta --sshloginfile nodes -a ids.txt --results /tmp/cache/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force
'
will-moore commented 5 months ago
[wmoore@test120-proxy ~]$ screen -r 3951.cache
...
Computers / CPU threads / Max jobs to run
1:omeroreadonly-1 / 1 / 10
2:omeroreadonly-2 / 1 / 10
3:omeroreadonly-3 / 1 / 10
4:omeroreadonly-4 / 1 / 10
5:omeroreadwrite / 1 / 10

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 13701s Left: 16084 AVG: 0.88s  omeroreadonly-1:10/458/8%/10.8s  omeroreadonly-2:10/401/7%/12.4s  omeroreadonly-3:10/322/5%/15.4s  omeroreadonly-4:10/440/7%/11.3s  omeroreadwrite:10/3994/70%/1.2s 
will-moore commented 5 months ago
[wmoore@test120-proxy ~]$ cat /tmp/cache/1/Image:10842059/stderr
usage: /opt/omero/server/venv3/bin/omero render test [-h] [--force] [--thumb]
                                                     object
/opt/omero/server/venv3/bin/omero render test: error: too few arguments
/tmp/par-job-5089_BCnlw: line 3: Image:10842059: command not found

Ooops - added newline in command above. Cancel and start again...

[wmoore@test120-proxy ~]$ rm -rf /tmp/cache/
[wmoore@test120-proxy ~]$ screen -dmS cache parallel --eta --sshloginfile nodes -a ids.txt --results /tmp/cache/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
will-moore commented 5 months ago

Looks like running OK now...

[wmoore@test120-proxy ~]$ grep ok /tmp/cache/**/**/*
...
/tmp/cache/1/Image:62321/stdout:ok: 62321 70.78979206085205 
/tmp/cache/1/Image:63089/stdout:ok: 63089 202.19777274131775 
/tmp/cache/1/Image:63857/stdout:ok: 63857 2.0913946628570557 
/tmp/cache/1/Image:64625/stdout:ok: 64625 1.5413122177124023 
/tmp/cache/1/Image:65393/stdout:ok: 65393 1.3958213329315186 

[wmoore@test120-proxy ~]$ grep ok /tmp/cache/**/**/* | wc
     50     150    3077

[wmoore@test120-proxy ~]$ grep Error /tmp/cache/**/**/*
/tmp/cache/1/Image:38517/stdout:fail: 38517 14.241586923599243 exception ::omero::ResourceError
/tmp/cache/1/Image:40053/stdout:fail: 40053 13.972737312316895 exception ::omero::ResourceError
will-moore commented 5 months ago

As discussed in IDR meeting now, decided to stop memo generation now and test with ~2000 filesets that have been processed so far.

[wmoore@test120-proxy ~]$ screen -X -S 14861.cache quit
will-moore commented 5 months ago

Exception on omeroreadonly-4

2024-02-15 09:53:44,486 INFO  [                 org.perf4j.TimingLogger] (.Server-31) start[1707990819611] time[4874] tag[omero.call.success.ome.logic.QueryImpl.findAllByQuery]
2024-02-15 09:53:44,486 INFO  [        ome.services.util.ServiceHandler] (.Server-31)  Rslt:    (ome.model.core.Image:Id_10496304)
2024-02-15 09:53:44,486 INFO  [    ome.security.basic.BasicEventContext] (.Server-26)  cctx:    group=-1
2024-02-15 09:53:44,492 INFO  [        ome.services.util.ServiceHandler] (.Server-38)  Meth:    interface ome.api.IQuery.findAllByQuery
2024-02-15 09:53:44,492 INFO  [        ome.services.util.ServiceHandler] (.Server-38)  Args:    [select i from Image as i left outer join i.wellSamples as ws join ws.well as well join well.plate as, PARAMS:F(o0l1):tag_text=Study Example Image id
=2501 ]
2024-02-15 09:53:44,495 INFO  [         ome.security.basic.EventHandler] (.Server-26)  Auth:    user=52,group=-1,event=null(User),sess=b3bff02f-85c2-4591-bbf0-5b73d2f11367
2024-02-15 09:53:44,647 INFO  [                 org.perf4j.TimingLogger] (.Server-36) start[1707990819845] time[4801] tag[omero.call.exception]
2024-02-15 09:53:44,647 WARN  [        ome.services.util.ServiceHandler] (.Server-36) Unknown exception thrown.

org.springframework.dao.DataAccessResourceFailureException: Hibernate operation: could not execute query; SQL [select image0_.id as id60_, image0_.acquisitionDate as acquisit2_60_, image0_.archived as archived60_, image0_.description as descript
4_60_, image0_.creation_id as creation10_60_, image0_.external_id as external11_60_, image0_.group_id as group12_60_, image0_.owner_id as owner13_60_, image0_.permissions as permissi5_60_, image0_.update_id as update14_60_, image0_.experiment as
 experiment60_, image0_.fileset as fileset60_, image0_.format as format60_, image0_.imagingEnvironment as imaging18_60_, image0_.instrument as instrument60_, image0_.name as name60_, image0_.objectiveSettings as objecti20_60_, image0_.partial as
 partial60_, image0_.series as series60_, image0_.stageLabel as stageLabel60_, image0_.version as version60_ from image image0_ left outer join wellsample wellsample1_ on image0_.id=wellsample1_.image and 
( 
  1 = ? OR 
  1 = ? OR 
  (wellsample1_.group_id in (?)) OR 
  (wellsample1_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = wellsample1_.group_id)) OR 
  (wellsample1_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = wellsample1_.group_id)) OR 
  ((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = wellsample1_.group_id)) 
) inner join well well2_ on wellsample1_.well=well2_.id inner join plate plate3_ on well2_.plate=plate3_.id left outer join screenplatelink screenlink4_ on plate3_.id=screenlink4_.child and 
( 
  1 = ? OR 
  1 = ? OR 
  (screenlink4_.group_id in (?)) OR 
  (screenlink4_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = screenlink4_.group_id)) OR 
  (screenlink4_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = screenlink4_.group_id)) OR 
  ((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = screenlink4_.group_id)) 
) inner join screen screen5_ on screenlink4_.parent=screen5_.id left outer join imageannotationlink annotation6_ on image0_.id=annotation6_.parent and 
( 
  1 = ? OR 
  1 = ? OR 
  (annotation6_.group_id in (?)) OR 
  (annotation6_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = annotation6_.group_id)) OR 
  (annotation6_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = annotation6_.group_id)) OR 
  ((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = annotation6_.group_id)) 
) inner join annotation annotation7_ on annotation6_.child=annotation7_.id where 
( 
  1 = ? OR 
  1 = ? OR 
  (image0_.group_id in (?)) OR 
  (image0_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = image0_.group_id)) OR 
  (image0_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = image0_.group_id)) OR 
:
  ((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = image0_.group_id)) 
) and screen5_.id=? and annotation7_.textValue=? order by well2_."column", well2_."row"]; ERROR: remaining connection slots are reserved for non-replication superuser connections
  Where: parallel worker; nested exception is org.postgresql.util.PSQLException: ERROR: remaining connection slots are reserved for non-replication superuser connections
  Where: parallel worker
        at org.springframework.jdbc.support.SQLErrorCodeSQLExceptionTranslator.doTranslate(SQLErrorCodeSQLExceptionTranslator.java:251)
        at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:73)
        at org.springframework.orm.hibernate3.HibernateAccessor.convertJdbcAccessException(HibernateAccessor.java:428)
        at org.springframework.orm.hibernate3.HibernateAccessor.convertHibernateAccessException(HibernateAccessor.java:414)
        at org.springframework.orm.hibernate3.HibernateTemplate.doExecute(HibernateTemplate.java:416)
        at org.springframework.orm.hibernate3.HibernateTemplate.execute(HibernateTemplate.java:342)
        at ome.logic.QueryImpl.execute(QueryImpl.java:176)
        at ome.logic.QueryImpl.findAllByQuery(QueryImpl.java:419)
        at jdk.internal.reflect.GeneratedMethodAccessor597.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
        at ome.security.basic.EventHandler.invoke(EventHandler.java:154)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
        at org.springframework.orm.hibernate3.HibernateInterceptor.invoke(HibernateInterceptor.java:119)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
        at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:99)
        at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:282)
        at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:96)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
        at ome.tools.hibernate.ProxyCleanupFilter$Interceptor.invoke(ProxyCleanupFilter.java:249)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
        at ome.services.util.ServiceHandler.invoke(ServiceHandler.java:121)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
        at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213)
        at com.sun.proxy.$Proxy111.findAllByQuery(Unknown Source)
        at jdk.internal.reflect.GeneratedMethodAccessor597.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
        at ome.security.basic.BasicSecurityWiring.invoke(BasicSecurityWiring.java:93)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
        at ome.services.blitz.fire.AopContextInitializer.invoke(AopContextInitializer.java:43)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
        at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213)
        at com.sun.proxy.$Proxy111.findAllByQuery(Unknown Source)
        at jdk.internal.reflect.GeneratedMethodAccessor729.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at ome.services.blitz.util.IceMethodInvoker.invoke(IceMethodInvoker.java:172)
:
        at ome.services.throttling.Callback.run(Callback.java:56)
        at ome.services.throttling.InThreadThrottlingStrategy.callInvokerOnRawArgs(InThreadThrottlingStrategy.java:56)
        at ome.services.blitz.impl.AbstractAmdServant.callInvokerOnRawArgs(AbstractAmdServant.java:140)
        at ome.services.blitz.impl.QueryI.findAllByQuery_async(QueryI.java:64)
        at jdk.internal.reflect.GeneratedMethodAccessor728.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
        at omero.cmd.CallContext.invoke(CallContext.java:85)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
        at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213)
        at com.sun.proxy.$Proxy112.findAllByQuery_async(Unknown Source)
        at omero.api._IQueryTie.findAllByQuery_async(_IQueryTie.java:90)
        at omero.api._IQueryDisp.___findAllByQuery(_IQueryDisp.java:594)
        at omero.api._IQueryDisp.__dispatch(_IQueryDisp.java:733)
        at IceInternal.Incoming.invoke(Incoming.java:221)
        at Ice.ConnectionI.invokeAll(ConnectionI.java:2536)
        at Ice.ConnectionI.dispatch(ConnectionI.java:1145)
        at Ice.ConnectionI.message(ConnectionI.java:1056)
        at IceInternal.ThreadPool.run(ThreadPool.java:395)
        at IceInternal.ThreadPool.access$300(ThreadPool.java:12)
        at IceInternal.ThreadPool$EventHandlerThread.run(ThreadPool.java:832)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.postgresql.util.PSQLException: ERROR: remaining connection slots are reserved for non-replication superuser connections
  Where: parallel worker
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2422)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2167)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:306)
        at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441)
        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365)
        at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:155)
        at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:118)
        at bitronix.tm.resource.jdbc.proxy.PreparedStatementJavaProxy.executeQuery(PreparedStatementJavaProxy.java:102)
        at org.hibernate.jdbc.AbstractBatcher.getResultSet(AbstractBatcher.java:208)
        at org.hibernate.loader.Loader.getResultSet(Loader.java:1953)
        at org.hibernate.loader.Loader.doQuery(Loader.java:802)
        at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:274)
        at org.hibernate.loader.Loader.doList(Loader.java:2542)
        at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2276)
        at org.hibernate.loader.Loader.list(Loader.java:2271)
        at org.hibernate.loader.hql.QueryLoader.list(QueryLoader.java:459)
        at org.hibernate.hql.ast.QueryTranslatorImpl.list(QueryTranslatorImpl.java:365)
        at org.hibernate.engine.query.HQLQueryPlan.performList(HQLQueryPlan.java:196)
        at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1268)
        at org.hibernate.impl.QueryImpl.list(QueryImpl.java:102)
        at ome.services.query.Query.doInHibernate(Query.java:253)
:
        at org.springframework.orm.hibernate3.HibernateTemplate.doExecute(HibernateTemplate.java:411)
        ... 64 common frames omitted
2024-02-15 09:53:44,648 INFO  [        ome.services.util.ServiceHandler] (.Server-36)  Excp:    org.springframework.dao.DataAccessResourceFailureException: Hibernate operation: could not execute query; SQL [select image0_.id as id60_, image0_.acquisitionDate as acquisit2_60_, image0_.archived as archived60_, image0_.description as descript4_60_, image0_.creation_id as creation10_60_, image0_.external_id as external11_60_, image0_.group_id as group12_60_, image0_.owner_id as owner13_60_, image0_.permissions as permissi5_60_, image0_.update_id as update14_60_, image0_.experiment as experiment60_, image0_.fileset as fileset60_, image0_.format as format60_, image0_.imagingEnvironment as imaging18_60_, image0_.instrument as instrument60_, image0_.name as name60_, image0_.objectiveSettings as objecti20_60_, image0_.partial as partial60_, image0_.series as series60_, image0_.stageLabel as stageLabel60_, image0_.version as version60_ from image image0_ left outer join wellsample wellsample1_ on image0_.id=wellsample1_.image and 
( 
  1 = ? OR 
  1 = ? OR 
  (wellsample1_.group_id in (?)) OR 
  (wellsample1_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = wellsample1_.group_id)) OR 
  (wellsample1_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = wellsample1_.group_id)) OR 
  ((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = wellsample1_.group_id)) 
) inner join well well2_ on wellsample1_.well=well2_.id inner join plate plate3_ on well2_.plate=plate3_.id left outer join screenplatelink screenlink4_ on plate3_.id=screenlink4_.child and 
( 
  1 = ? OR 
  1 = ? OR 
  (screenlink4_.group_id in (?)) OR 
  (screenlink4_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = screenlink4_.group_id)) OR 
  (screenlink4_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = screenlink4_.group_id)) OR 
  ((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = screenlink4_.group_id)) 
) inner join screen screen5_ on screenlink4_.parent=screen5_.id left outer join imageannotationlink annotation6_ on image0_.id=annotation6_.parent and 
( 
  1 = ? OR 
  1 = ? OR 
  (annotation6_.group_id in (?)) OR 
  (annotation6_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = annotation6_.group_id)) OR 
  (annotation6_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = annotation6_.group_id)) OR 
  ((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = annotation6_.group_id)) 
) inner join annotation annotation7_ on annotation6_.child=annotation7_.id where 
( 
  1 = ? OR 
  1 = ? OR 
  (image0_.group_id in (?)) OR 
  (image0_.owner_id = ? AND (select (__g.permissions & 1024) = 1024 from experimentergroup __g where __g.id = image0_.group_id)) OR 
  (image0_.group_id in (?,?) AND (select (__g.permissions & 64) = 64 from experimentergroup __g where __g.id = image0_.group_id)) OR 
  ((select (__g.permissions & 4) = 4 from experimentergroup __g where __g.id = image0_.group_id)) 
) and screen5_.id=? and annotation7_.textValue=? order by well2_."column", well2_."row"]; ERROR: remaining connection slots are reserved for non-replication superuser connections
  Where: parallel worker; nested exception is org.postgresql.util.PSQLException: ERROR: remaining connection slots are reserved for non-replication superuser connections
  Where: parallel worker
2024-02-15 09:53:44,648 INFO  [    ome.security.basic.BasicEventContext] (.Server-33)  cctx:    group=-1
2024-02-15 09:53:44,657 INFO  [         ome.security.basic.EventHandler] (.Server-33)  Auth:    user=52,group=-1,event=null(User),sess=b3bff02f-85c2-4591-bbf0-5b73d2f11367
2024-02-15 09:53:44,659 INFO  [o.services.sessions.SessionContext$Count] (.Server-23) -Reference count: b3bff02f-85c2-4591-bbf0-5b73d2f11367=15
will-moore commented 5 months ago

Started again...

screen -dmS cache parallel --eta --sshloginfile nodes -a ids.txt --results /tmp/cache/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

Status after about 5 mins...

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 4107s Left: 20296 AVG: 0.20s  omeroreadonly-1:10/38/3%/8.3s  omeroreadonly-2:10/267/19%/1.2s  omeroreadonly-3:10/280/19%/1.1s  omeroreadonly-4:10/221/15%/1.4s  omeroreadwrite:10/597/41%/0.5s 
will-moore commented 4 months ago

16 hours later... apart from computer:1, percentages seem lower today than yesterday

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 440639s Left: 19178 AVG: 23.45s  1:10/257/10%/230.1s  2:10/471/18%/125.6s  3:10/469/18%/126.1s  4:10/482/19%/122.7s  5:10/842/33%/70.2s 
will-moore commented 4 months ago

Summary of current progress...

[wmoore@test120-proxy ~]$ ls -alh /tmp/cache/1/ | grep "Feb 14" | wc
   1984   17856  121856
[wmoore@test120-proxy ~]$ ls -alh /tmp/cache/1/ | grep "Feb 15" | wc
    487    4383   30194
[wmoore@test120-proxy ~]$ ls -alh /tmp/cache/1/ | grep "Feb 16" | wc
    102     918    6312
[wmoore@test120-proxy ~]$ ls -alh /tmp/cache/1/ | wc
   2574   23159  158373
[wmoore@test120-proxy ~]$ grep ok /tmp/cache/**/**/* | wc
   2586    7902  170484
will-moore commented 4 months ago

Stopped for more testing

[wmoore@test120-proxy ~]$ screen -X -S 10106.cache quit
will-moore commented 4 months ago

Testing successful! Starting again...

screen -dmS cache parallel --eta --sshloginfile nodes -a ids.txt --results /tmp/cache/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

After 4 minutes...

screen -r

Computer:jobs running/jobs completed/%of started jobs
ETA: 2235s Left: 19477 AVG: 0.12s  omeroreadonly-1:10/505/22%/0.6s  omeroreadonly-2:10/480/21%/0.6s  omeroreadonly-3:10/530/23%/0.5s  omeroreadonly-4:10/454/20%/0.6s  omeroreadwrite:10/253/11%/1.1s 
will-moore commented 4 months ago

Stopped memo generation during morning IDR meeting. Found in testing that on idr-testing : omeroreadonly-1 the goofys mount had failed, which likely explains the fact that several memo files have failed to generate.

Re-mounted it and then restarted again...

screen -dmS cache parallel --eta --sshloginfile nodes -a ids.txt --results /tmp/cache/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
will-moore commented 4 months ago

On idr0138-pilot, we want to create memos, particularly for idr0090

This lists all Plates for idr0090...

/opt/omero/server/OMERO.server/bin/omero hql --ids-only --style csv 'select link.child.id from ScreenPlateLink as link where link.parent.id=2851'

Can use that in a query to get Images

(venv3) (base) [wmoore@pilot-idr0138-omeroreadwrite scripts]$ /opt/omero/server/OMERO.server/bin/omero hql --limit -1 --ids-only --style csv 'select MIN(field.image.id) FROM WellSample AS field where field.well.plate.id in (select link.child.id from ScreenPlateLink as link where link.parent.id=2851) GROUP BY field.well.plate'
Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
#,Col1
0,12544757
1,12550005
2,12551317
3,12546037
4,12550677
5,12553269
6,12554229
7,12547509
8,12554709
9,12541269
10,12553749
11,12545749
12,12549141
13,12552053
14,12554997
15,12539701
16,12552789
17,12543765
18,12548245
19,12542037
20,12546773
21,12543029
(venv3) (base) [wmoore@pilot-idr0138-omeroreadwrite scripts]$ /opt/omero/server/OMERO.server/bin/omero hql --limit -1 --ids-only --style csv 'select MIN(field.image.id) FROM WellSample AS field where field.well.plate.id in (select link.child.id from ScreenPlateLink as link where link.parent.id=2851) GROUP BY field.well.plate' > idr0090_plates.txt
cut -d ',' -f2 idr0090_plates.txt | sed -e 's/^/Image:/' > idr0090_ids.txt

Edit to remove first line Col

screen -S bfcache
(base) (venv3) [wmoore@pilot-idr0138-omeroreadwrite ~]$ source /opt/omero/server/venv3/bin/activate

(venv3) (base) (venv3) [wmoore@pilot-idr0138-omeroreadwrite ~]$ for i in $(cat idr0090_ids.txt); do echo $i && omero render -s localhost -u public -w public test $i --force; done;
will-moore commented 4 months ago

On idr-testing, seemed that memo failed for all Plates in idr0090...

Using same idr0090_ids.txt from idr0138 above...

[wmoore@test120-proxy ~]$ for i in $(cat idr0090_ids.txt); do echo $i && grep Error /tmp/cache/1/$i/* ; done
...
Image:12548245
/tmp/cache/1/Image:12548245/stdout:fail: 12548245 0.4982106685638428 exception ::omero::ResourceError
Image:12542037
/tmp/cache/1/Image:12542037/stdout:fail: 12542037 8.83667540550232 exception ::omero::ResourceError
Image:12546773
/tmp/cache/1/Image:12546773/stdout:fail: 12546773 0.15928888320922852 exception ::omero::ResourceError
Image:12543029
/tmp/cache/1/Image:12543029/stdout:fail: 12543029 6.815727233886719 exception ::omero::ResourceError

But I don't have a good way of working out when the goofys mount failed on idr:testing omeroreadonly-1.

will-moore commented 4 months ago

Decided to focus just on idr0090 memo file generation on idr-testing too... Cancelled run above, and restarted just with idr0090...

[wmoore@test120-proxy ~]$ screen -dmS cache parallel --eta --sshloginfile nodes -a idr0090_ids.txt --results /tmp/cache/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
will-moore commented 4 months ago
(venv3) (base) (venv3) [wmoore@pilot-idr0138-omeroreadwrite ~]$ for i in $(cat idr0090_ids.txt); do echo $i && omero render -s localhost -u public -w public test $i --force; done;
Image:12544757
Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
Image:12550005
Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
ok: 12550005 5086.527655124664
Image:12551317
Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
ok: 12551317 4007.054927587509
Image:12546037
Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
ok: 12546037 4222.491681337357
Image:12550677
Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
ok: 12550677 3997.5826559066772
Image:12553269
Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
ok: 12553269 2755.3615374565125
Image:12554229
Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
ok: 12554229 2705.048905849457
Image:12547509
Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
ok: 12547509 4176.419909954071
Image:12554709
Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
ok: 12554709 2855.7143104076385
Image:12541269
Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
ok: 12541269 3.1789915561676025
Image:12553749
Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
ok: 12553749 2566.5211091041565
Image:12545749
Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
ok: 12545749 1536.8917303085327
Image:12549141
Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
ok: 12549141 4395.9468557834625
Image:12552053
Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
ok: 12552053 4067.1386001110077
Image:12554997
Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
will-moore commented 4 months ago

Seb's query for getting ONLY NGFF data for memo file regeneration, using the OME-NGFF tag... https://openmicroscopy.slack.com/archives/C0K5WAD8A/p1699433365146299

/opt/omero/server/OMERO.server/bin/omero hql --limit -1 --ids-only --style csv 'SELECT MIN(field.image.id) FROM Screen as s JOIN s.plateLinks as plateLinks JOIN s.annotationLinks as annLinks join annLinks.child as ann join plateLinks.child as p join p.wells as w join w.wellSamples as field where  ann.id=38280644 GROUP BY field.well.plate' > ngff_plates.txt

/opt/omero/server/OMERO.server/bin/omero hql --limit -1 --ids-only --style csv 'SELECT d.id FROM Project as p JOIN p.datasetLinks as datasetLinks JOIN p.annotationLinks as annLinks join annLinks.child as ann join datasetLinks.child as d where  ann.id=38280644' > ngff_datasets.txt

$ cut -d ',' -f2 ngff_plates.txt | sed -e 's/^/Image:/' > ngff_ids.txt
$ cut -d ',' -f2 ngff_datasets.txt | sed -e 's/^/Dataset:/' >> ngff_ids.txt 
will-moore commented 4 months ago
screen -dmS cache parallel --eta --sshloginfile nodes -a ngff_ids.txt --results /tmp/ngff_cache/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
will-moore commented 4 months ago
[wmoore@test120-proxy ~]$ grep ok: /tmp/ngff_cache/1/**/* | wc
     29      87    1991
[wmoore@test120-proxy ~]$ ls /tmp/ngff_cache/1 | wc
    132     132    1802
[wmoore@test120-proxy ~]$ grep "Password check failed" /tmp/ngff_cache/1/**/* | wc
     24     240    3341

Seeing a bunch of these... "Password check failed" e.g.

[wmoore@test120-proxy ~]$ less /tmp/ngff_cache/1/Image:1484503/stderr
Image:1484503/stderr ``` Traceback (most recent call last): File "/opt/omero/server/venv3/lib/python3.6/site-packages/omero/plugins/sessions.py", line 622, in attach rv = store.attach(server, name, uuid, set_current=set_current) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 349, in attach set_current=set_current) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 380, in create sf = client.createSession(name, pasw) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/clients.py", line 640, in createSession prx = rtr.createSession(username, password, ctx) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/Glacier2_Router_ice.py", line 258, in createSession return _M_Glacier2.Router._op_createSession.invoke(self, ((userId, password), _ctx)) Glacier2.PermissionDeniedException: exception ::Glacier2::PermissionDeniedException { reason = Password check failed for '76f463be-21a1-48c6-a7af-e38222e21671': [] } During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/omero/server/venv3/bin/omero", line 8, in sys.exit(main()) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/main.py", line 126, in main rv = omero.cli.argv() File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1787, in argv cli.invoke(args[1:]) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1225, in invoke stop = self.onecmd(line, previous_args) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1302, in onecmd self.execute(line, previous_args) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1384, in execute args.func(args) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero_cli_render.py", line 351, in _wrapper self.client = self.ctx.conn(*args) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1602, in conn self.controls["sessions"].login(args) File "/opt/omero/server/venv3/lib/python3.6/site-packages/omero/plugins/sessions.py", line 517, in login check_group=True) File "/opt/omero/server/venv3/lib/python3.6/site-packages/omero/plugins/sessions.py", line 615, in check_and_attach return self.attach(store, server, name, uuid, props, exists) File "/opt/omero/server/venv3/lib/python3.6/site-packages/omero/plugins/sessions.py", line 627, in attach store.clear(server, name, uuid) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 443, in clear self.walk(f, host, name, sess) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 335, in walk func(h, n, s) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 441, in f s.remove() File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero_ext/path.py", line 1313, in remove os.remove(self) FileNotFoundError: [Errno 2] No such file or directory: path('/home/wmoore/omero/sessions/localhost/public/76f463be-21a1-48c6-a7af-e38222e21671') /tmp/ngff_cache/1/Image:1484503/stderr (END) ```
will-moore commented 4 months ago

Current status...

[wmoore@test120-proxy ~]$ grep ok: /tmp/ngff_cache/1/**/* | wc
    229     687   15984
[wmoore@test120-proxy ~]$ 
[wmoore@test120-proxy ~]$ ls -alh /tmp/ngff_cache/1/ | wc
    377    3386   22877
[wmoore@test120-proxy ~]$ ls -alh /tmp/ngff_cache/1/ | wc
    377    3386   22877
[wmoore@test120-proxy ~]$ 
[wmoore@test120-proxy ~]$ ls -alh /tmp/ngff_cache/1/ | grep "Feb 20 22"
...
drwxrwxr-x.   2 wmoore wmoore  45 Feb 20 22:33 Image:1605339
drwxrwxr-x.   2 wmoore wmoore  45 Feb 20 22:33 Image:1605719
drwxrwxr-x.   2 wmoore wmoore  45 Feb 20 22:33 Image:1606099
drwxrwxr-x.   2 wmoore wmoore  45 Feb 20 22:33 Image:1606479
drwxrwxr-x.   2 wmoore wmoore  45 Feb 20 22:33 Image:1606859
drwxrwxr-x.   2 wmoore wmoore  45 Feb 20 22:33 Image:1607220
drwxrwxr-x.   2 wmoore wmoore  45 Feb 20 22:33 Image:1607600
drwxrwxr-x.   2 wmoore wmoore  45 Feb 20 22:33 Image:1607980

Cancelled this to focus on idr0090 for testing...

screen -dmS cache parallel --eta --sshloginfile nodes -a idr0090_ids.txt --results /tmp/ngff_cache/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
will-moore commented 4 months ago

Left idr0090 memo generation running overnight, but it only resulted in 9 out of 22 Plates with "ok".

Ran again this morning...

screen -dmS cache parallel --eta --sshloginfile nodes -a idr0090_ids.txt --results /tmp/memo_idr0090_20240222/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

11/22 are "ok" but the other 11 have some Exception:

[wmoore@test120-proxy ~]$ grep ok: /tmp/memo_idr0090_20240222/1/**/*
/tmp/memo_idr0090_20240222/1/Image:12539701/stdout:ok: 12539701 2.2314953804016113 
/tmp/memo_idr0090_20240222/1/Image:12543029/stdout:ok: 12543029 5772.457081079483 
/tmp/memo_idr0090_20240222/1/Image:12544757/stdout:ok: 12544757 1.5645971298217773 
/tmp/memo_idr0090_20240222/1/Image:12545749/stdout:ok: 12545749 2015.8153846263885 
/tmp/memo_idr0090_20240222/1/Image:12546773/stdout:ok: 12546773 2.3995842933654785 
/tmp/memo_idr0090_20240222/1/Image:12548245/stdout:ok: 12548245 2.1758549213409424 
/tmp/memo_idr0090_20240222/1/Image:12550005/stdout:ok: 12550005 5618.203018188477 
/tmp/memo_idr0090_20240222/1/Image:12552789/stdout:ok: 12552789 3435.908084154129 
/tmp/memo_idr0090_20240222/1/Image:12553269/stdout:ok: 12553269 2.3640246391296387 
/tmp/memo_idr0090_20240222/1/Image:12553749/stdout:ok: 12553749 2.3888068199157715 
/tmp/memo_idr0090_20240222/1/Image:12554229/stdout:ok: 12554229 3453.4601786136627 

[wmoore@test120-proxy ~]$ grep dException: /tmp/memo_idr0090_20240222/1/**/*
/tmp/memo_idr0090_20240222/1/Image:12541269/stderr:Glacier2.PermissionDeniedException: exception ::Glacier2::PermissionDeniedException
/tmp/memo_idr0090_20240222/1/Image:12542037/stderr:Glacier2.PermissionDeniedException: exception ::Glacier2::PermissionDeniedException
/tmp/memo_idr0090_20240222/1/Image:12543765/stderr:    unknown = Network.cpp:2357: Ice::ConnectionRefusedException:
/tmp/memo_idr0090_20240222/1/Image:12546037/stderr:Glacier2.PermissionDeniedException: exception ::Glacier2::PermissionDeniedException
/tmp/memo_idr0090_20240222/1/Image:12547509/stderr:    unknown = Network.cpp:2357: Ice::ConnectionRefusedException:
/tmp/memo_idr0090_20240222/1/Image:12549141/stderr:    unknown = Network.cpp:2357: Ice::ConnectionRefusedException:
/tmp/memo_idr0090_20240222/1/Image:12550677/stderr:    unknown = Network.cpp:2357: Ice::ConnectionRefusedException:
/tmp/memo_idr0090_20240222/1/Image:12551317/stderr:    unknown = Network.cpp:2357: Ice::ConnectionRefusedException:
/tmp/memo_idr0090_20240222/1/Image:12552053/stderr:Glacier2.PermissionDeniedException: exception ::Glacier2::PermissionDeniedException
/tmp/memo_idr0090_20240222/1/Image:12554709/stderr:Glacier2.PermissionDeniedException: exception ::Glacier2::PermissionDeniedException
/tmp/memo_idr0090_20240222/1/Image:12554997/stderr:Glacier2.PermissionDeniedException: exception ::Glacier2::PermissionDeniedException
will-moore commented 4 months ago
screen -dmS cache parallel --eta --sshloginfile nodes -a idr0090_ids.txt --results /tmp/memo_idr0090_20240222a/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
will-moore commented 4 months ago

Manually viewing in web to trigger memo files:

will-moore commented 4 months ago

To test memo file generation on some data that takes less time than idr0090, lets get some IDs for idr0011-ScreenB (40 Plates)

/opt/omero/server/OMERO.server/bin/omero hql --limit -1 --ids-only --style csv 'select MIN(field.image.id) FROM WellSample AS field where field.well.plate.id in (select link.child.id from ScreenPlateLink as link where link.parent.id=1551) GROUP BY field.well.plate' > idr0011B_plates.txt

cut -d ',' -f2 idr0011B_plates.txt | sed -e 's/^/Image:/' > idr0011B_ids.txt
screen -dmS cache parallel --eta --sshloginfile nodes -a idr0011B_ids.txt --results /tmp/memo_idr0011B_20240222/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

PermissionDeniedException:

less /tmp/memo_idr0011B_20240222/1/Image:2857249/stderr ``` Traceback (most recent call last): File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/plugins/sessions.py", line 622, in attach rv = store.attach(server, name, uuid, set_current=set_current) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 349, in attach set_current=set_current) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 380, in create sf = client.createSession(name, pasw) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/clients.py", line 640, in createSession prx = rtr.createSession(username, password, ctx) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/Glacier2_Router_ice.py", line 258, in createSession return _M_Glacier2.Router._op_createSession.invoke(self, ((userId, password), _ctx)) Glacier2.PermissionDeniedException: exception ::Glacier2::PermissionDeniedException { reason = Password check failed for 'db3735ab-9eec-4058-95b2-3b3427f1c911': [] } During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/omero/server/venv3/bin/omero", line 8, in sys.exit(main()) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/main.py", line 126, in main rv = omero.cli.argv() File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1787, in argv cli.invoke(args[1:]) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1225, in invoke stop = self.onecmd(line, previous_args) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1302, in onecmd self.execute(line, previous_args) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1384, in execute args.func(args) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero_cli_render.py", line 351, in _wrapper self.client = self.ctx.conn(*args) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1602, in conn self.controls["sessions"].login(args) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/plugins/sessions.py", line 517, in login check_group=True) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/plugins/sessions.py", line 615, in check_and_attach return self.attach(store, server, name, uuid, props, exists) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/plugins/sessions.py", line 627, in attach store.clear(server, name, uuid) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 443, in clear self.walk(f, host, name, sess) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 335, in walk func(h, n, s) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 441, in f s.remove() File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero_ext/path.py", line 1313, in remove os.remove(self) FileNotFoundError: [Errno 2] No such file or directory: path('/home/wmoore/omero/sessions/localhost/public/db3735ab-9eec-4058-95b2-3b3427f1c911') ```

EDIT - completed... 32 / 40 ok, 8 Exceptions...

[wmoore@test120-proxy ~]$ grep ok: /tmp/memo_idr0011B_20240222/1/**/* | wc
     32      96    2618

[wmoore@test120-proxy ~]$ grep dExc /tmp/memo_idr0011B_20240222/1/**/*
/tmp/memo_idr0011B_20240222/1/Image:2856051/stderr:    unknown = Network.cpp:2357: Ice::ConnectionRefusedException:
/tmp/memo_idr0011B_20240222/1/Image:2856191/stderr:    unknown = Network.cpp:2357: Ice::ConnectionRefusedException:
/tmp/memo_idr0011B_20240222/1/Image:2856755/stderr:    unknown = Network.cpp:2357: Ice::ConnectionRefusedException:
/tmp/memo_idr0011B_20240222/1/Image:2856795/stderr:    unknown = Network.cpp:2357: Ice::ConnectionRefusedException:
/tmp/memo_idr0011B_20240222/1/Image:2856967/stderr:    unknown = Network.cpp:2357: Ice::ConnectionRefusedException:
/tmp/memo_idr0011B_20240222/1/Image:2857249/stderr:Glacier2.PermissionDeniedException: exception ::Glacier2::PermissionDeniedException
/tmp/memo_idr0011B_20240222/1/Image:2857670/stderr:    unknown = Network.cpp:2357: Ice::ConnectionRefusedException:
/tmp/memo_idr0011B_20240222/1/Image:2857705/stderr:    unknown = Network.cpp:2357: Ice::ConnectionRefusedException:
will-moore commented 4 months ago

Try running the same omero render command used in parallel above, but run it manually on each of the servers, using Image ID that failed above from idr0011B

ssh into each in turn...

[wmoore@test120-omeroreadwrite ~]$ /opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test Image:2857249 --force
ok: 2857249 253.97674584388733

[wmoore@test120-omeroreadonly-1 ~]$ /opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test Image:2857249 --force
ok: 2857249 5.177480936050415 

First one took longer, then all the readonly servers much faster. All OK. No Errors.

will-moore commented 4 months ago

Try running again to see if the same images get errors..

screen -dmS cache parallel --eta --sshloginfile nodes -a idr0011B_ids.txt --results /tmp/memo_idr0011B_20240222a/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

This time all worked!

[wmoore@test120-proxy ~]$ grep ok: /tmp/memo_idr0011B_20240222a/1/**/* | wc
     40     120    3325

So, no clues as to why we got exceptions the first time, but shows that re-running can help.

will-moore commented 4 months ago

Let's get on and run memo generation on idr0010 (148 Plates)

/opt/omero/server/OMERO.server/bin/omero hql --limit -1 --ids-only --style csv 'select MIN(field.image.id) FROM WellSample AS field where field.well.plate.id in (select link.child.id from ScreenPlateLink as link where link.parent.id=1351) GROUP BY field.well.plate' > idr0010_plates.txt

cut -d ',' -f2 idr0010_plates.txt | sed -e 's/^/Image:/' > idr0010_ids.txt
screen -dmS cache parallel --eta --sshloginfile nodes -a idr0010_ids.txt --results /tmp/memo_idr0010_20240222/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
[wmoore@test120-proxy ~]$ grep ok: /tmp/memo_idr0010_20240222/1/**/* | wc
    123     369    9981
[wmoore@test120-proxy ~]$ grep Error /tmp/memo_idr0010_20240222/1/**/*  | wc
     25     225    4900

All 25 failures are the same e.g:

Image:1921251/stderr ``` [wmoore@test120-proxy ~]$ less /tmp/memo_idr0010_20240222/1/Image:1921251/stderr rv = store.attach(server, name, uuid, set_current=set_current) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 349, in attach set_current=set_current) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 380, in create sf = client.createSession(name, pasw) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/clients.py", line 657, in createSession raise wrapped # We only retry concurrency issues. File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/clients.py", line 640, in createSession prx = rtr.createSession(username, password, ctx) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/Glacier2_Router_ice.py", line 258, in createSession return _M_Glacier2.Router._op_createSession.invoke(self, ((userId, password), _ctx)) omero.WrappedCreateSessionException: exception ::omero::WrappedCreateSessionException { reason = Can't find default group for 9ff6863a-ffff-44b8-a02e-ae0ea7324fe3 concurrency = False backOff = -1 type = ome.conditions.ApiUsageException } During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/omero/server/venv3/bin/omero", line 8, in sys.exit(main()) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/main.py", line 126, in main rv = omero.cli.argv() File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1787, in argv cli.invoke(args[1:]) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1225, in invoke stop = self.onecmd(line, previous_args) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1302, in onecmd self.execute(line, previous_args) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1384, in execute args.func(args) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero_cli_render.py", line 351, in _wrapper self.client = self.ctx.conn(*args) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1602, in conn self.controls["sessions"].login(args) File "/opt/omero/server/venv3/lib/python3.6/site-packages/omero/plugins/sessions.py", line 517, in login check_group=True) File "/opt/omero/server/venv3/lib/python3.6/site-packages/omero/plugins/sessions.py", line 615, in check_and_attach return self.attach(store, server, name, uuid, props, exists) File "/opt/omero/server/venv3/lib/python3.6/site-packages/omero/plugins/sessions.py", line 627, in attach store.clear(server, name, uuid) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 443, in clear self.walk(f, host, name, sess) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 335, in walk func(h, n, s) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 441, in f s.remove() File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero_ext/path.py", line 1313, in remove os.remove(self) FileNotFoundError: [Errno 2] No such file or directory: path('/home/wmoore/omero/sessions/localhost/public/9ff6863a-ffff-44b8-a02e-ae0ea7324fe3') ```
will-moore commented 4 months ago

Re ran for idr0010...

screen -dmS cache parallel --eta --sshloginfile nodes -a idr0010_ids.txt --results /tmp/memo_idr0010_20240222a/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

This time we see 2 different Exceptions:

[wmoore@test120-proxy ~]$ grep Exception: /tmp/memo_idr0010_20240222a/1/**/*
/tmp/memo_idr0010_20240222a/1/Image:1922019/stderr:Glacier2.PermissionDeniedException: exception ::Glacier2::PermissionDeniedException
/tmp/memo_idr0010_20240222a/1/Image:1943685/stderr:omero.WrappedCreateSessionException: exception ::omero::WrappedCreateSessionException
/tmp/memo_idr0010_20240222a/1/Image:3053868/stderr:Glacier2.PermissionDeniedException: exception ::Glacier2::PermissionDeniedException
/tmp/memo_idr0010_20240222a/1/Image:3055788/stderr:omero.WrappedCreateSessionException: exception ::omero::WrappedCreateSessionException
/tmp/memo_idr0010_20240222a/1/Image:3057708/stderr:omero.WrappedCreateSessionException: exception ::omero::WrappedCreateSessionException
/tmp/memo_idr0010_20240222a/1/Image:3059628/stderr:Glacier2.PermissionDeniedException: exception ::Glacier2::PermissionDeniedException
/tmp/memo_idr0010_20240222a/1/Image:3061548/stderr:omero.WrappedCreateSessionException: exception ::omero::WrappedCreateSessionException
/tmp/memo_idr0010_20240222a/1/Image:3063468/stderr:omero.WrappedCreateSessionException: exception ::omero::WrappedCreateSessionException
/tmp/memo_idr0010_20240222a/1/Image:3065437/stderr:omero.WrappedCreateSessionException: exception ::omero::WrappedCreateSessionException

The WrappedCreateSessionException we saw in the previous run.

The PermissionDeniedException looks like:

PermissionDeniedException ``` [wmoore@test120-proxy ~]$ less /tmp/memo_idr0010_20240222a/1/Image:1922019/stderr Traceback (most recent call last): File "/opt/omero/server/venv3/lib/python3.6/site-packages/omero/plugins/sessions.py", line 622, in attach rv = store.attach(server, name, uuid, set_current=set_current) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 349, in attach set_current=set_current) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 380, in create sf = client.createSession(name, pasw) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/clients.py", line 640, in createSession prx = rtr.createSession(username, password, ctx) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/Glacier2_Router_ice.py", line 258, in createSession return _M_Glacier2.Router._op_createSession.invoke(self, ((userId, password), _ctx)) Glacier2.PermissionDeniedException: exception ::Glacier2::PermissionDeniedException { reason = Password check failed for '70cf79b9-800b-414a-ad59-3919899ba7a6': [user=public, created=2024-02-22 11:25:23.516, closed=2024-02-22 14:37:38.68] } During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/omero/server/venv3/bin/omero", line 8, in sys.exit(main()) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/main.py", line 126, in main rv = omero.cli.argv() File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1787, in argv cli.invoke(args[1:]) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1225, in invoke stop = self.onecmd(line, previous_args) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1302, in onecmd self.execute(line, previous_args) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1384, in execute args.func(args) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero_cli_render.py", line 351, in _wrapper self.client = self.ctx.conn(*args) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1602, in conn self.controls["sessions"].login(args) File "/opt/omero/server/venv3/lib/python3.6/site-packages/omero/plugins/sessions.py", line 517, in login check_group=True) File "/opt/omero/server/venv3/lib/python3.6/site-packages/omero/plugins/sessions.py", line 615, in check_and_attach return self.attach(store, server, name, uuid, props, exists) File "/opt/omero/server/venv3/lib/python3.6/site-packages/omero/plugins/sessions.py", line 627, in attach store.clear(server, name, uuid) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 443, in clear self.walk(f, host, name, sess) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 335, in walk func(h, n, s) File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 441, in f s.remove() File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero_ext/path.py", line 1313, in remove os.remove(self) FileNotFoundError: [Errno 2] No such file or directory: path('/home/wmoore/omero/sessions/localhost/public/70cf79b9-800b-414a-ad59-3919899ba7a6') ```
will-moore commented 4 months ago

Try idr0012...

/opt/omero/server/OMERO.server/bin/omero hql --limit -1 --ids-only --style csv 'select MIN(field.image.id) FROM WellSample AS field where field.well.plate.id in (select link.child.id from ScreenPlateLink as link where link.parent.id=1202) GROUP BY field.well.plate' > idr0012_plates.txt

cut -d ',' -f2 idr0012_plates.txt | sed -e 's/^/Image:/' > idr0012_ids.txt
screen -dmS cache parallel --eta --sshloginfile nodes -a idr0012_ids.txt --results /tmp/memo_idr0012_20240222/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
will-moore commented 4 months ago

Logs show idr0012 looks complete (23:10) so I cancelled the screen (still running?).

Started idr0016...

screen -dmS cache parallel --eta --sshloginfile nodes -a idr0016_ids.txt --results /tmp/memo_idr0016_20240222/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'

EDIT: > 36 hours later 7:50 on 24th. Still running... (413 plates in total):

[wmoore@test120-proxy ~]$ grep ok: /tmp/memo_idr0016_20240222/1/**/* | wc
    267     801   21760
[wmoore@test120-proxy ~]$ grep Error /tmp/memo_idr0016_20240222/1/**/* | wc
      5      25     575
will-moore commented 4 months ago

Nearly done with idr0016:

[wmoore@test120-proxy ~]$ grep ok: /tmp/memo_idr0016_20240222/1/**/* | wc
    353    1059   28778

To keep things running over rest of weekend, cancelled that and ran all of ngff again...

screen -dmS cache parallel --eta --sshloginfile nodes -a ngff_ids.txt --results /tmp/memo_ngff_20240225/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
will-moore commented 4 months ago

Since all 5 idr-testing servers are getting constant DatabaseBusyExceptions, I cancelled memo generation. But the issue is persisting - 15 mins.

Going to restart all servers...

for server in omeroreadwrite omeroreadonly-1 omeroreadonly-2 omeroreadonly-3 omeroreadonly-4; do ssh $server "sudo service omero-server restart"; done
will-moore commented 4 months ago

Try again with all ngff...

screen -dmS cache parallel --eta --sshloginfile nodes -a ngff_ids.txt --results /tmp/memo_ngff_20240225/ -j10 '/opt/omero/server/OMERO.server/bin/omero render -s localhost -u public -w public test --force'
will-moore commented 2 months ago
/opt/omero/server/OMERO.server/bin/omero hql --limit -1 --ids-only --style csv 'select d.id from Dataset d > datasets.txt where d.id in (select link.child.id from ProjectDatasetLink as link where link.parent.id=301)' > idr0026_datasets.txt

/opt/omero/server/OMERO.server/bin/omero hql --limit -1 --ids-only --style csv 'select MIN(field.image.id) FROM WellSample AS field where field.well.plate.id in (select link.child.id from ScreenPlateLink as link where link.parent.id=1101) GROUP BY field.well.plate' > idr0013A_plates.txt