Closed twhyntie closed 8 years ago
@drmarkwslater Can you help Tom?
@twhyntie One obvious question coming to mind is if the file system you are using is "unusual" in some sense.
@egede Thanks - this is a good question :-) What counts as "unusual" for Ganga?
@alahiff knows about the cluster (I don't, I've just been given access as a test "GridPP new user").
The home directory is mounted from an NFS server. @twhyntie what happens if you use /scratch/twhyntie instead? (this is on the local disk)
At least that Repository_runtime ::checkDiskQuota
message is suspicious. I don't see that it should cause a hang though. It's broken on my local system in the same way but I don't see a hang.
@twhyntie Could you try starting up Ganga, letting it hang for 10-15 seconds and then look in /home/tier1/twhyntie/gangdir
and see if there's a file in there called thread_trace.html
. That will help to show us what's causing the hang.
@milliams Thanks - thread_trace.html
is there as described. Here are the last two <h2
> entries:
<h2>MainThread</h2>
<pre> File "/home/tier1/twhyntie/Ganga/install/6.1.19/bin/ganga", line 65, in <module>
Ganga.Runtime._prog.bootstrap(Ganga.Runtime._prog.interactive)
File "/home/tier1/twhyntie/Ganga/install/6.1.19/python/Ganga/Runtime/bootstrap.py", line 940, in bootstrap
startUpRegistries()
File "/home/tier1/twhyntie/Ganga/install/6.1.19/python/Ganga/Runtime/Repository_runtime.py", line 254, in startUpRegistries
for n, k, d in bootstrap():
File "/home/tier1/twhyntie/Ganga/install/6.1.19/python/Ganga/Runtime/Repository_runtime.py", line 134, in bootstrap
registry.startup()
File "/home/tier1/twhyntie/Ganga/install/6.1.19/python/Ganga/GPIDev/Lib/Registry/PrepRegistry.py", line 28, in startup
super(PrepRegistry, self).startup()
File "/home/tier1/twhyntie/Ganga/install/6.1.19/python/Ganga/Core/GangaRepository/Registry.py", line 168, in decorated
return f(self, *args, **kwargs)
File "/home/tier1/twhyntie/Ganga/install/6.1.19/python/Ganga/Core/GangaRepository/Registry.py", line 931, in startup
self.metadata.startup()
File "/home/tier1/twhyntie/Ganga/install/6.1.19/python/Ganga/Core/GangaRepository/Registry.py", line 168, in decorated
return f(self, *args, **kwargs)
File "/home/tier1/twhyntie/Ganga/install/6.1.19/python/Ganga/Core/GangaRepository/Registry.py", line 937, in startup
self.repository.startup()
File "/home/tier1/twhyntie/Ganga/install/6.1.19/python/Ganga/Core/GangaRepository/GangaRepositoryXML.py", line 215, in startup
self.sessionlock.startup()
File "/home/tier1/twhyntie/Ganga/install/6.1.19/python/Ganga/Core/GangaRepository/SessionLock.py", line 304, in decorated
return f(self, *args, **kwargs)
File "/home/tier1/twhyntie/Ganga/install/6.1.19/python/Ganga/Core/GangaRepository/SessionLock.py", line 391, in startup
self.global_lock_acquire()
File "/home/tier1/twhyntie/Ganga/install/6.1.19/python/Ganga/Core/GangaRepository/SessionLock.py", line 563, in global_lock_acquire
self.delay_lock_mod(self.lockfd, fcntl.LOCK_EX)
File "/home/tier1/twhyntie/Ganga/install/6.1.19/python/Ganga/Core/GangaRepository/SessionLock.py", line 519, in delay_lock_mod
fcntl.lockf(lockfd, lock_mod)
</pre>
<h2>GANGA_Update_Thread_Ganga_Worker_0</h2>
<pre> File "/usr/lib64/python2.6/threading.py", line 504, in __bootstrap
self.__bootstrap_inner()
File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.6/threading.py", line 484, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/tier1/twhyntie/Ganga/install/6.1.19/python/Ganga/Core/GangaThread/WorkerThreads/WorkerThreadPool.py", line 78, in __worker_thread
item = self.__queue.get(True, 0.05)
File "/usr/lib64/python2.6/Queue.py", line 177, in get
self.not_empty.wait(remaining)
File "/usr/lib64/python2.6/threading.py", line 258, in wait
_sleep(delay)
</pre>
<small>2016-05-11T09:34:49.911054</small>
@alahiff As in set $HOME
to /scratch/twhyntie/
and run the Ganga install script in there?
@twhyntie In your first message I saw /home/tier1/twhyntie/gangadir/repository/twhyntie/LocalXML
so I'd assumed you'd done something which would cause this path to be used. Does Ganga always use the home directory by default?
The only 'strange' thing about /home/tier1/twhyntie
is NFS, so that's why I thought it might be worthwhile trying /scratch/twhyntie
@alahiff Yup, that worked for both the local install and CVMFS :-)
$ export HOME=/scratch/twhyntie
. /cvmfs/ganga.cern.ch/runGanga.sh
[...]
[10:45:37]
Ganga In [1]:
Ganga seems to use $HOME
as the default install directory (right?). So running from a NFS-mounted $HOME
directory apparently causes a problem for Ganga - is this likely to be a wider issue for other Ganga users/clusters?
Anyway, happy to close this now - and thanks everyone for the help! Cheers, @twhyntie
Ganga does indeed use the home directory by default (though it can be overridden in the config). It should work fine on NFS as that is how I often run it but it's worth testing other options. Ganga seemed to be struggling with file locks which were causing the problems. Is this something you've seen before @alahiff?
Ganga should be absolutely fine on an NFS system (we haven't had a report like this before anyway!). It does seem to show that it's hanging when attempting to lock on a file from python 'fcntl' library.
As a test, can you try from within plain python:
import os, fcntl
open("/home/tier1/twhyntie/gangadir/mylock", "w").close()
fd = os.open("/home/tier1/twhyntie/gangadir/mylock", os.O_RDWR)
fcntl.lockf(fd, fcntl.LOCK_EX)
fcntl.lockf(fd, fcntl.LOCK_UN)
This is basically all Ganga is doing at this point and so maybe there's an issue with file locking on this NFS share.
@milliams Am I correct in thinking there was a PR addressing multithreading issues in SessionLock
in the past I think this is thread-safe?
Could there be some stale filelock not being cleaned up on nfs? (apologies I don't know how nfs handles this off the top of my head)
@drmarkwslater Thanks - have tested (with $HOME=/home/tier1/twhyntie
) and get the following error:
Traceback (most recent call last):
File "testformark.py", line 8, in <module>
fcntl.lockf(fd, fcntl.LOCK_EX)
IOError: [Errno 37] No locks available
So, yes, looks like a non-Ganga issue (and not a show-stopper for running Ganga on the Ral T1 cluster with @alahiff's workaround).
@twhyntie I would recommend to change the location of the gangadir
in the .gangarc
configuration instead of changing $HOME
(which could cause other problems). Just uncomment the gangadir
entry in the file and set it to whatever other value you like.
Right - that's the problem then 😄 Now having said that, allowing Ganga to run on a filesystem without file locks seems like a sensible idea to me. I've created an issue for it: #465
@egede Sure, but is there an option to set gangadir
when running /cvmfs/ganga.cern.ch/runGanga.sh
or ganga-install
? Apologies if I've missed them...
@twhyntie the runGanga.sh
script will pick up everything in your .gangarc
so that's definitely the best option as @egede suggests (it actually does very little other than run Ganga 😄 ). You can also specify it at the command line as well:
/cvmfs/ganga.cern.ch/runGanga.sh -o[Configuration]gangadir=/scratch/twhyntie/gangadir
@drmarkwslater OK, thanks - as you don't get .gangarc
until you do the install,
. /cvmfs/ganga.cern.ch/runGanga.sh -o[Configuration]gangadir=/scratch/twhyntie/gangadir
is what is needed, right?
This is a good point I didn't think about as it would have to start Ganga to even generate the .gangarc. I'll add this to the tutorial on readthedocs! but yes, what you have there should work without issue.
Hello,
I'm trying to run Ganga on the RAL Tier-1 cluster. I've tried both running from CVMFS and installing via the install script
ganga-install
and both hang. Using the--debug
flag it gets this far:I've tried CVMFS running on two other clusters (QMUL and RALPP) with success, so I'm guessing there's something I need to to on the cluster side - but is there anything obvious that jumps out at you from these messages?
Many thanks in advance for any help/advice, @twhyntie