StochasticAnalytics / emClarity

GNU Lesser General Public License v3.0
40 stars 6 forks source link

ERROR when cluster #160

Closed MXXXZ closed 8 months ago

MXXXZ commented 9 months ago

Hi Ben! I encountered the following problem when performing clustering:

Error using BH_clusterPub (line 409) Caught error, saving workspace for evaluation.\n

Error in emClarity (line 483)

and this is the .mat file looks like : 01

and logs

particleMass: 0.4000 Ali_mType: 'cylinder' Cls_mType: 'cylinder' Peak_mType: 'cylinder' particleRadius: [40 40 140] Ali_mRadius: [33 35 130] Ali_mCenter: [0 0 0] Cls_mRadius: [35 35 140] Ali_samplingRate: 4 Cls_samplingRate: 4 Raw_classes_odd: [3x1 double] Raw_classes_eve: [3x1 double] symmetry: 'C3' Cls_className: 8 Cls_classes_odd: [2x8 double] Cls_classes_eve: [2x8 double] flgCones: 1 Tmp_samplingRate: 6 Tmp_angleSearch: [180 5 180 5] Tmp_threshold: 2000 Raw_angleSearch: [120 18 120 18] fscGoldSplitOnTomos: 0 Raw_className: 0 Fsc_bfactor: [8 4 2] flgClassify: 1 tomoCprDefocusRefine: 0 tomoCprDefocusRange: 5.0000e-07 tomoCprDefocusStep: 2.0000e-08 Cls_mCenter: [0 0 0] pcaScaleSpace: [4 8 16] Pca_randSubset: 3500 Pca_maxEigs: 24 Pca_coeffs: [3x10 double] Pca_clusters: [8 12] Pca_nReplicates: 800 PcaGpuPull: 1000

featureVector = 2x1 cell array

{3x10 double}
{0x0  double}

ans = 3 4 5 6 7 8 9 10 11 12 3 4 5 6 7 8 9 10 11 12 3 4 5 6 7 8 9 10 11 12

kDistMeasure = 'sqeuclidean' coeffsUNTRIMMED = 1x3 cell array {24x4305 single} {24x4305 single} {24x4305 single}

Starting parallel pool (parpool) using the 'emC_tmp_2819729826' profile ... Connected to the parallel pool (number of workers: 24). nScaleSpace = 3 ans = 3 4 5 6 7 8 9 10 11 12 3 4 5 6 7 8 9 10 11 12 3 4 5 6 7 8 9 10 11 12 nFeatures = 0 0 0 nAdded = 0 fV = 3 4 5 6 7 8 9 10 11 12 nAdded = 10 fV = 3 4 5 6 7 8 9 10 11 12 nAdded = 20 fV = 3 4 5 6 7 8 9 10 11 12 nAdded = 30

I hope I have provided enough information! Thank you very much for your help

bHimes commented 9 months ago

Please attach your logFile/emClarity.log

MXXXZ commented 9 months ago

emClarity.log Thank you for your reply! Here is the log file.

bHimes commented 9 months ago

emClarity.log Thank you for your reply! Here is the log file.

Hmmm, thanks for the log. I can't find the commit hash that is referenced in your logfile - did you compile this or is it from a distribution I released?

Is there a file named ClusterLine391Err.mat output?

Also, it looks like there is a warning from matlab that your GPU's are newer than the driver that the code was compiled with, which will slow down most runs unless you have a large enough cache for the cuda PTX defined. What cards are you running on?

MXXXZ commented 9 months ago

Yes, I got two related files: https://github.com/MXXXZ/emClarity-issues-160 (sorry for using a link because of the file size)

It's a bit of a challenge for me so I didn't compile it, maybe I can try it later. I'm running it on Quadro RTX 8000 4 or NVIDIA GeForce RTX 3090 4 . I share clusters with my colleagues so I may switch between them. As you said, I think the runs is slowing down, hope I can solve the follow-up problems after I get familiar with all this😵‍💫

Thanks again for your help!

bHimes commented 9 months ago

@MXXXZ taking a look at it now. For a solution to the runtime/slow startup, please see this updated entry on the wiki.

bHimes commented 9 months ago

@MXXXZ

Diagnosis

So either there is some bug in the function to check for zeros (very unlikely) Or perhaps you have a corrupt memory cell (not so likely, but I just had a very perplexing issue where I had a few bad cells in a stick of ram that only occasional caused problems. I had to replace it.) Or you maybe modified your cycle009...pcaFull.mat?

To try Re-running "cluster" will keep re-producing the same errors, so you could just re-do pca and see if it goes away. This might indicate a memory (or disk) error, which you would want to trouble shoot with your IT pro.

I did notice that you are using a smaller mask radius than your particle radius. This is okay for classification, but for alignment it should always be larger.

So, I would suggest 1) go back to the beginning of cycle 9 2) Make your masks larger than your particle radius 3) (possibly enable flgPcaShapeMask=1 if you want a more focused classification) 4) re-run "emClarity avg param9.m 9 RawAlignment 5) re-run "emClarity pca param9.m 9 0" 6) re-run "emClarity cluster param9.m 9"

MXXXZ commented 9 months ago

I have enabled cuda cache and it has significantly shorted the runtime. Thank you very much, this has been very helpful!😘

MXXXZ commented 9 months ago

I have resolved the problem by adjusting mask parameters and rerunning the cycle. I've also observed that using an incorrect mask parameter in another dataset can generate the same error, which leads me to infer that a discrepancy between the mask and particle parameters might result in erroneous 0 values. Thanks for your patient guidance!