I2PC / scipion

Scipion is an image processing framework to obtain 3D models of macromolecular complexes using Electron Microscopy (3DEM)
http://scipion.i2pc.es
Other
76 stars 47 forks source link

'relion - particle extracting' crashes with 1M particles #2058

Open JuhaHuiskonen opened 4 years ago

JuhaHuiskonen commented 4 years ago

When extracting over 1M particles using 'relion - particle extracting' I get the following error:

03487: Sqlite query: INSERT INTO MDTable_3( "rlnCoordinateX", "rlnCoordinateY", "rlnImageName", "rlnMicrographName", "rlnMagnification", "rlnVoltage", "rlnDefocusU", "rlnDefocusV", "rlnDefocusAngle", "rlnSphericalAberration", "rlnBfactor", "rlnCtfScalefactor", "rlnPhaseShift", "rlnAmplitudeContrast", "rlnOriginX", "rlnOriginY", "rlnDetectorPixelSize") SELECT "rlnCoordinateX", "rlnCoordinateY", "rlnImageName", "rlnMicrographName", "rlnMagnification", "rlnVoltage", "rlnDefocusU", "rlnDefocusV", "rlnDefocusAngle", "rlnSphericalAberration", "rlnBfactor", "rlnCtfScalefactor", "rlnPhaseShift", "rlnAmplitudeContrast", "rlnOriginX", "rlnOriginY", "rlnDetectorPixelSize" FROM MDTable_2

If I make a subset of just 200 coordinates, the protocol finishes fine. Is there a maximum limit of particles Scipion can handle?

pconesa commented 4 years ago

Hi, @JuhaHuiskonen. No 200 K (I guess you missed K) is not the maximun limit. I've seen project using more than that. It is true that getting over 500K things get very slow and it might become annoying.

This issue must be something else. Could you please post more log lines.?

JuhaHuiskonen commented 4 years ago

I used just 200 (not 200K) to check that the project itself and the inputs were fine. I can try with more to see where it fails.

Here's more log lines from the failed run with 1M particles:

03433: srun which relion_preprocess_mpi --i micrographs_00001-03460.star --part_star micrographs_00001-03460_particles.star --coord_dir "." --coord_suffix .coords.star --part_dir "." --extract --extract_size 400 --set_angpix 4.240000 --bg_radius 47 --invert_contrast --norm --scale 100 --white_dust 5.000 --black_dust 5.000 03434: === RELION MPI setup === 03435: + Number of MPI processes = 20 03436: + Master (0) runs on host = r05c20.bullx 03437: + Slave 1 runs on host = r05c20.bullx 03438: + Slave 2 runs on host = r05c20.bullx 03439: + Slave 3 runs on host = r05c20.bullx 03440: + Slave 4 runs on host = r05c20.bullx 03441: + Slave 5 runs on host = r05c20.bullx 03442: + Slave 6 runs on host = r05c20.bullx 03443: + Slave 7 runs on host = r05c20.bullx 03444: + Slave 8 runs on host = r05c20.bullx 03445: + Slave 9 runs on host = r05c20.bullx 03446: + Slave 10 runs on host = r05c20.bullx 03447: + Slave 11 runs on host = r05c20.bullx 03448: + Slave 12 runs on host = r05c20.bullx 03449: + Slave 13 runs on host = r05c20.bullx 03450: + Slave 14 runs on host = r05c20.bullx 03451: + Slave 15 runs on host = r05c20.bullx 03452: + Slave 16 runs on host = r05c20.bullx 03453: + Slave 17 runs on host = r05c20.bullx 03454: + Slave 18 runs on host = r05c20.bullx 03455: + Slave 19 runs on host = r05c20.bullx 03456: ================= 03457: + Setting pixel size in output STAR file to 4.24 Angstroms 03458: WARNING: You manually changed the pixel size by the --setangpix option. You can no longer use Bayesian Polishing on the resulting particles. 03459: Extracting particles from the micrographs ... 03460: 12.68/12.68 min ............................................................~~(,,"> 03461: Joining metadata of all particles from 3351 micrographs in one STAR file... 03462: Written out STAR file with 1057682 particles in micrographs_00001-03460_particles.star 03463: The new pixel size of the extracted particles are 16.96 Angstrom/pixel. 03464: Done preprocessing! 03465: FINISHED: extractMicrographListStep, step 1 03466: 2019-10-22 16:12:28.089902 03467: Traceback (most recent call last): 03468: File "/projappl/project_2001566/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line 186, in run 03469: self._run() 03470: File "/projappl/project_2001566/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line 1289, in _run 03471: self._runSteps(startIndex) 03472: File "/projappl/project_2001566/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line 1161, in _runSteps 03473: self._stepsCheckSecs) 03474: File "/projappl/project_2001566/apps/scipion/2.0/pyworkflow/protocol/executor.py", line 133, in runSteps 03475: stepsCheckCallback() 03476: File "/projappl/project_2001566/apps/scipion/2.0/pyworkflow/em/protocol/protocol_particles.py", line 320, in _stepsCheck 03477: self._checkNewOutput() 03478: File "/projappl/project_2001566/apps/scipion/2.0/pyworkflow/em/protocol/protocol_particles.py", line 527, in _checkNewOutput 03479: self._updateOutputPartSet(newDone, streamMode) 03480: File "/projappl/project_2001566/apps/scipion/2.0/pyworkflow/em/protocol/protocol_particles.py", line 581, in _updateOutputPartSet 03481: self.readPartsFromMics(micList, outputParts) 03482: File "/projappl/project_2001566/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_extract_particles.py", line 305, in readPartsFromMics 03483: sortByLabel=md.RLN_MICROGRAPH_NAME): 03484: File "/projappl/project_2001566/apps/scipion/2.0/pyworkflow/em/metadata/utils.py", line 97, in iterRows 03485: md.sort(sortByLabel) 03486: XmippError: Error code: 21 message: no such table: MDTable_2 03487: Sqlite query: INSERT INTO MDTable_3( "rlnCoordinateX", "rlnCoordinateY", "rlnImageName", "rlnMicrographName", "rlnMagnification", "rlnVoltage", "rlnDefocusU", "rlnDefocusV", "rlnDefocusAngle", "rlnSphericalAberration", "rlnBfactor", "rlnCtfScalefactor", "rlnPhaseShift", "rlnAmplitudeContrast", "rlnOriginX", "rlnOriginY", "rlnDetectorPixelSize") SELECT "rlnCoordinateX", "rlnCoordinateY", "rlnImageName", "rlnMicrographName", "rlnMagnification", "rlnVoltage", "rlnDefocusU", "rlnDefocusV", "rlnDefocusAngle", "rlnSphericalAberration", "rlnBfactor", "rlnCtfScalefactor", "rlnPhaseShift", "rlnAmplitudeContrast", "rlnOriginX", "rlnOriginY", "rlnDetectorPixelSize" FROM MDTable_2 03488: ------------------- PROTOCOL FAILED (DONE 1/2)

pconesa commented 4 years ago

Sorry @JuhaHuiskonen , now I realized I did not read you correctly.

I've seen sets of almost 8M elements, but they were clearly unpracticable. 1M particle should work but you'll be waiting so long for some steps to finish or to visualize sets. Here our users (I've just asked) said that works but takes "TOO LONG". I'd say 1M, as it is now, challenges Scipion and it's clearly degrading scipion usability.

We have planned to invest time on this for the next release (we always planned for this)...but I believe this time has to happen.

JuhaHuiskonen commented 4 years ago

@pconesa OK, we will wait for the update and in the meanwhile split the set to smaller chunks.

delarosatrevin commented 4 years ago

From the error log it seems like a bug in the Xmipp metadata class, when trying to execute the line:

md.sort(sortByLabel)  # while iterating thrown the star file rows

I have created an issue in the scipion-em-relion repo, we might consider to replace the use of the Xmipp's metadata (We will do it anyway for Relion 3.1 new star files handling)

@pconesa I don't know if you want to close this one or keep it as a reminder of this problem.

pconesa commented 4 years ago

leave it....I'll address it with the others when improving performance

JuhaHuiskonen commented 4 years ago

I was wondering if there will be a quick fix to md.sort(sortByLabel) or should we wait for Relion3.1 protocols?

delarosatrevin commented 4 years ago

Hi @JuhaHuiskonen, I don't know when the md.sort issue will be addressed in Xmipp, I don't have time myself to look into it. In the first week of Nov, I plan to start looking into Relion 3.1 and using another implementation to handling star files. So, I could start by implementing the particle extraction protocol for you to give it a try if you have already Relion 3.1 installed. The good thing with Relion 3.1 in Scipion is that you will not be stuck with this version and you will be able to easily swap back to 3.0. I'm sorry that you are stuck with this issue right now.

I'm wondering if this issue happened in streaming mode or not. Could you try to re-launch this protocol and try a batchSize=20, for example? In that way, I think the generated star files are parsed in smaller chunks and not the whole set.

JuhaHuiskonen commented 4 years ago

This helped us with errors related to large projects and SQL operations:

SQLITE_TMPDIR=/path/to/large/scratch/disk/ export SQLITE_TMPDIR

delarosatrevin commented 4 years ago

Thanks Juha! I think we will keep this issue open as a reminder to check for more robust solutions.