Closed azazellochg closed 2 years ago
Can you check if other protocols that use MPI for the parallelization of steps (STEPS_PARALLEL) also fail in these systems? (e.g, movie alignment or CTF estimation)
@delarosatrevin both ctffind4 and unblur seems to work with both threads/mpi locally/on cluster
Hum...that's more weird
Now it crushes also with many thread or MPIs, locally or on cluster. The weird thing is it works with 4 threads, as soon as I use 6-8 or more, it fails in the middle of the run. The process just dies, no error or anything. I will try to debug it further when I find time.
could be related to:
From your error log it seems that there is a bug in this Xmipp protocol in line:
00095: File "/beebylab/software/scipion/1.2/scipion/pyworkflow/em/packages/xmipp3/protocol_extract_particles_movies.py", line 434, in _filterMovie 00096: micrograph = micSet[movieId]
Where the micSet (internally a .sqlite database) is accessed from multiple threads. I think this should be fixed by removing this query in the way it is now. In the meantime, as David suggested, you could try to run with only 1 processor, but not sure how long it will take.
I'm sorry for this issue. Best, Jose Miguel
Is there any part in the code that tries to grab an item from the input set? I'm not sure if the error is the same, the one here seems more related to MPI and there is not Sqlite error over there...anyway...just a guess.
Outdated. Protocol ExtractMovieParticles will be deprecated
Scipion 1.1. Beta-gal mrcs movies from Relion tutorial. After few minutes the protocol is shown as running, but it is not the case (there is a dump file core.***** in projects folder). It crashed both on cluster and on a local machine. When running with threads - everything is fine. I wonder whether anyone else encountered this or this is specific MPI-installation-related?
Log file: