KudryashevLab / TomoBEAR

TomoBEAR is a configurable and customizable modular pipeline for streamlined processing of cryo-electron tomographic data for subtomogram averaging.
https://github.com/KudryashevLab/TomoBEAR/wiki
Other
26 stars 6 forks source link

[BUG?] pipeline crashes at parfoor loop #23

Closed rickhooy closed 1 year ago

rickhooy commented 1 year ago

Describe the bug

While executing Ribosome tutorial and test dataset matlab errors out at parfor loop. For each project the pipeline crashes at distinct steps - see Screenshots

Test Project JSON file

{
    "general": {
        "project_name": "test",
        "project_description": "Practice test",
        "data_path": "/data/tb/Frames/*.tif",
        "processing_path": "/data/tb/processing",
        "expected_symmetrie": "C1",
        "template_matching_binning": 8,
        "gold_bead_size_in_nm": 10,
        "reconstruction_thickness": 3000,
        "rotation_tilt_axis": 79,
        "aligned_stack_binning": 2,
        "pre_aligned_stack_binning": 2,
        "binnings": [2, 4, 8]
    },
    "MetaData": {
    },
    "SortFiles": {
    },
    "MotionCor2": {
    },
    "CreateStacks": {
    },
    "DynamoTiltSeriesAlignment": {
    },
    "DynamoCleanStacks": {
    },
    "BatchRunTomo": {
        "skip_steps": [4],
        "ending_step": 6
    },
    "StopPipeline": {
    },
    "BatchRunTomo": {
        "starting_step": 8,
        "ending_step": 8
    },
    "GCTFCtfphaseflipCTFCorrection": {
    },
    "BatchRunTomo": {
        "starting_step": 10,
        "ending_step": 13
    },
    "BinStacks": {
    },
    "Reconstruct": {
        "reconstruct": "binned"
    },
    "DynamoImportTomograms": {
    },
    "EMDTemplateGeneration": {
        "template_emd_number": "4015"
    },
    "DynamoTemplateMatching": {
        "sampling": 15
    },
    "TemplateMatchingPostProcessing": {
        "parallel_execution": false,
        "cc_std": 2.5,
        "crop_particles": true,
        "as_boxes": true,
        "box_size": 1.5
    },
    "DynamoAlignmentProject": {
        "iterations": 3,
        "classes": 4,
        "use_symmetrie": false,
        "use_noise_classes": true
    },
    "DynamoAlignmentProject": {
        "iterations": 3,
        "classes": 4,
        "use_noise_classes": true,
        "use_symmetrie": false,
        "selected_classes": [1]
    },
    "DynamoAlignmentProject": {
        "iterations": 3,
        "classes": 4,
        "use_noise_classes": true,
        "use_symmetrie": false,
        "selected_classes": [1]
    },
    "BinStacks": {
        "use_ctf_corrected_aligned_stack": false,
        "binnings": [2, 4, 8]
    },
    "DynamoAlignmentProject": {
        "iterations": 3,
        "classes": 4,
        "use_noise_classes": true,
        "use_symmetrie": true,
        "selected_classes": [1],
        "binning": 8
    },
    "StopPipeline": {
    },
    "DynamoAlignmentProject": {
        "classes": 1,
        "iterations": 1,
        "use_noise_classes": false,
        "swap_particles": false,
        "use_symmetrie": true,
        "cone_flip": 1,
        "selected_classes": [1,2,3,4],
        "box_size": 0.666,
        "binning": 4,
        "threshold": 0.9
    },
    "DynamoAlignmentProject": {
        "classes": 1,
        "iterations": 1,
        "use_noise_classes": false,
        "swap_particles": false,
        "use_symmetrie": true,
        "selected_classes": [1,2],
        "binning": 2,
        "threshold": 1
    }
}

Error message

Error using BinStacks/process Too many input arguments.

Error in iteration (line 53) instantiated_class = instantiated_class.process();

Error in LocalPipeline/execution_parallel (line 560) parfor j = 1:length(indices)

Error in LocalPipeline/execute (line 400) [dynamic_configuration_out, tomogramstatus{i - 1}] = obj.("execution" + execution_method)(merged_configuration, obj.pipeline_definition{i}, previous_tomogram_status);

Error in runPipeline (line 150) pipeline.execute(starting_tomogram, ending_tomogram, step, gpu);

Error in runTomoBear (line 34) runPipeline(compute_environment, configuration_path, default_configuration_path, starting_tomogram, ending_tomogram, step, gpu);

560 parfor j = 1:length(indices)

To Reproduce Steps to reproduce the behavior:

  1. modified defaults.json for in-house workstation and unmodified tutorial project json
  2. runTomoBear in matlab; runTomoBear("local","/data/tomobearTutorial/ribosome_empiar_10064_dynamo.json") OR runTomoBear("local","/data/tb/test.json")

Expected behavior Quick search on mathworks forums did not immediately reveal reason for error

Screenshots Step 26: BinStacks from Ribosome Tutorial tomoBear_parfor_error_ribosome

Step 6: DynamoCleanStacks from test dataset tomoBear_parfor_error

Desktop (please complete the following information):

Additional context Could just be a default or project variable that needs to change. Please advise.

ArtsemiY commented 1 year ago

Hi @rickhooy ! Please, add a parameter "execution_method": "sequential" in the configuration sections of the problematic steps "BinStacks" and "DynamoCleanStacks" for the corresponding datasets, e.g.:

"BinStacks": {
       "execution_method": "sequential"
}

and post the new error messages.

P.S. For parallel routines there is no debug mode so that the only way to get the real error message is to switch to sequential processing mode.

rickhooy commented 1 year ago

Hi Artsemi, thanks for the troubleshooting assistance!

I figured it out, at least for the test project. For some reason I had two 'x_CreateStacks_1' directories in my output folder. When DynamoCleanStacks looks for tilt series stacks, it found two locations, one of which does not have the TS. I somehow created the second CreateStacks output dir without realizing - probably in playing around with the project.json config file. I deleted the erroneous CreateStacks folder. Now the pipeline proceeds as expected. It's possibly a similar issue with the Ribosome tutorial project. I'll look into it next.

Full details below, if interested:

Execute runTomoBear from matlab terminal after changing DynamoCleanStacks steps to 'sequential'

stack_filepath_wrong

INFO: Skipping pipeline step (DynamoTiltSeriesAlignment) due to availability of a SUCCESS file! INFO: Executing pipeline step 6: DynamoCleanStacks for tomogram 1... Index exceeds the number of array elements. Index must not exceed 0.

Error in DynamoCleanStacks/process (line 38) [path, name, extension] = fileparts(stack_file(1).name);

Error in iteration (line 53) instantiated_class = instantiated_class.process();

Error in LocalPipeline/execution_sequential (line 682) [dynamic_configurations{j}, status{j}] = iteration(merged_configuration, pipeline_definition, tomogram_names{j}, previous_tomogram_status(j));

Error in LocalPipeline/execute (line 400) [dynamic_configuration_out, tomogramstatus{i - 1}] = obj.("execution" + execution_method)(merged_configuration, obj.pipeline_definition{i}, previous_tomogram_status);

Error in runPipeline (line 150) pipeline.execute(starting_tomogram, ending_tomogram, step, gpu);

Error in runTomoBear (line 34) runPipeline(compute_environment, configuration_path, default_configuration_path, starting_tomogram, ending_tomogram, step, gpu);

38 [path, name, extension] = fileparts(stack_file(1).name); K>>

tilt_stack_path =

"/data/tb/processing/output/1_CreateStacks_1/tomogram_001/*.st"

% wrong; CreateStacks should be step 4...

pipeline output directory:

drwxrwxr-x 3 hooy hooy 4.0K Jul 18 16:39 1_CreateStacks_1 drwxrwxr-x 2 hooy hooy 4.0K Jul 18 16:44 1_MetaData_1 drwxrwxr-x 3 hooy hooy 4.0K Jul 18 16:44 2_SortFiles_1 drwxrwxr-x 3 hooy hooy 4.0K Jul 21 10:01 3_MotionCor2_1 drwxrwxr-x 3 hooy hooy 4.0K Jul 21 10:01 4_CreateStacks_1 drwxrwxr-x 3 hooy hooy 4.0K Jul 21 10:06 5_DynamoTiltSeriesAlignment_1 drwxrwxr-x 3 hooy hooy 4.0K Jul 21 10:06 6_DynamoCleanStacks_1

Ah, for some reason there are two CreateStack directories...

line 32 of DynanoCleanStacks: create_stacks_input_path = dir(obj.configuration.processing_path + string(filesep) + obj.configuration.output_folder + string(filesep)...

create_stacks_input_path.name

ans =

'1_CreateStacks_1'

ans =

'4_CreateStacks_1'

No stacks in '1_CreateStacks_1', hence error and pipeline crash.

ArtsemiY commented 1 year ago

Hey @rickhooy , thanks for sharing, glad that you resolved that one as well!

P.S. Step folders could be repeated several times, that's acceptable. The only thing which you need to keep in mind, as you already figured out, is to have step folders structure which corresponds to the input json file structure in terms of the sequence and names of the steps.

P.P.S. Additional hint for troubleshooting is to keep the structure of the scratch/ folder as in input json file as well (if you experience any issues, just delete there folders made by errors or from previous runs).

I guess, the issue can be closed. Please, reopen if needed