CovertLab / wcEcoli

Whole Cell Model of E. coli
Other
19 stars 4 forks source link

AnalysisParcaTask fizzling when running sims with COMPRESS_OUTPUT=1 #1419

Open rjuenemann opened 11 months ago

rjuenemann commented 11 months ago

I ran the following on Sherlock on the ng-trl-eff-shift-variant-only branch in preparation for PR #1415

DESC="sherlock_internal_shift_metadata_test" VARIANT="new_gene_expression_and_translation_efficiency_internal_shift" FIRST_VARIANT_INDEX=2 LAST_VARIANT_INDEX=2 N_GENS=4 NEW_GENES="gfp" PLOTS=ACTIVE COMPRESS_OUTPUT=1 RAISE_ON_TIME_LIMIT=1 WC_ANALYZE_FAST=1 python runscripts/fireworks/fw_queue.py

I noticed the AnalysisParcaTask fizzled:

lpad get_fws
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
[
    {
        "fw_id": 1,
        "created_on": "2023-12-05T20:43:09.522596",
        "updated_on": "2023-12-06T01:51:41.448942",
        "state": "COMPLETED",
        "name": "AnalysisSingleTask__Var_2__Seed_0__Gen_3__Cell_0"
    },
    {
        "fw_id": 2,
        "created_on": "2023-12-05T20:43:09.522475",
        "updated_on": "2023-12-06T02:02:38.083444",
        "state": "COMPLETED",
        "name": "ScriptTask_compression_simulation__Seed_0__Gen_3__Cell_0"
    },
    {
        "fw_id": 3,
        "created_on": "2023-12-05T20:43:09.522372",
        "updated_on": "2023-12-06T01:39:09.968905",
        "state": "COMPLETED",
        "name": "SimulationTask__Var_02__Seed_0__Gen_3__Cell_0"
    },
    {
        "fw_id": 4,
        "created_on": "2023-12-05T20:43:09.522171",
        "updated_on": "2023-12-06T00:14:16.978444",
        "state": "COMPLETED",
        "name": "AnalysisSingleTask__Var_2__Seed_0__Gen_2__Cell_0"
    },
    {
        "fw_id": 5,
        "created_on": "2023-12-05T20:43:09.522048",
        "updated_on": "2023-12-06T01:58:06.264906",
        "state": "COMPLETED",
        "name": "ScriptTask_compression_simulation__Seed_0__Gen_2__Cell_0"
    },
    {
        "fw_id": 6,
        "created_on": "2023-12-05T20:43:09.521947",
        "updated_on": "2023-12-06T00:05:26.230629",
        "state": "COMPLETED",
        "name": "SimulationTask__Var_02__Seed_0__Gen_2__Cell_0"
    },
    {
        "fw_id": 7,
        "created_on": "2023-12-05T20:43:09.521706",
        "updated_on": "2023-12-05T23:11:12.747786",
        "state": "COMPLETED",
        "name": "AnalysisSingleTask__Var_2__Seed_0__Gen_1__Cell_0"
    },
    {
        "fw_id": 8,
        "created_on": "2023-12-05T20:43:09.521582",
        "updated_on": "2023-12-06T01:57:57.157442",
        "state": "COMPLETED",
        "name": "ScriptTask_compression_simulation__Seed_0__Gen_1__Cell_0"
    },
    {
        "fw_id": 9,
        "created_on": "2023-12-05T20:43:09.521478",
        "updated_on": "2023-12-05T22:58:25.945713",
        "state": "COMPLETED",
        "name": "SimulationTask__Var_02__Seed_0__Gen_1__Cell_0"
    },
    {
        "fw_id": 10,
        "created_on": "2023-12-05T20:43:09.521253",
        "updated_on": "2023-12-05T22:29:31.901319",
        "state": "COMPLETED",
        "name": "AnalysisSingleTask__Var_2__Seed_0__Gen_0__Cell_0"
    },
    {
        "fw_id": 11,
        "created_on": "2023-12-05T20:43:09.521113",
        "updated_on": "2023-12-06T01:57:51.547185",
        "state": "COMPLETED",
        "name": "ScriptTask_compression_simulation__Seed_0__Gen_0__Cell_0"
    },
    {
        "fw_id": 12,
        "created_on": "2023-12-05T20:43:09.521013",
        "updated_on": "2023-12-05T22:16:52.316777",
        "state": "COMPLETED",
        "name": "SimulationTask__Var_02__Seed_0__Gen_0__Cell_0"
    },
    {
        "fw_id": 13,
        "created_on": "2023-12-05T20:43:09.520819",
        "updated_on": "2023-12-06T01:46:25.366586",
        "state": "COMPLETED",
        "name": "AnalysisMultiGenTask__Var_02__Seed_000000"
    },
    {
        "fw_id": 14,
        "created_on": "2023-12-05T20:43:09.520692",
        "updated_on": "2023-12-06T01:46:18.284736",
        "state": "COMPLETED",
        "name": "AnalysisCohortTask__Var_02"
    },
    {
        "fw_id": 15,
        "created_on": "2023-12-05T20:43:09.520588",
        "updated_on": "2023-12-06T01:58:28.094396",
        "state": "COMPLETED",
        "name": "ScriptTask_compression_variant_KB"
    },
    {
        "fw_id": 16,
        "created_on": "2023-12-05T20:43:09.520495",
        "updated_on": "2023-12-05T21:42:58.831782",
        "state": "COMPLETED",
        "name": "VariantSimDataTask__new_gene_expression_and_translation_efficiency_internal_shift_000002"
    },
    {
        "fw_id": 17,
        "created_on": "2023-12-05T20:43:09.520357",
        "updated_on": "2023-12-06T01:45:05.873231",
        "state": "COMPLETED",
        "name": "AnalysisVariantTask"
    },
    {
        "fw_id": 18,
        "created_on": "2023-12-05T20:43:09.520232",
        "updated_on": "2023-12-05T21:44:22.102542",
        "state": "FIZZLED",
        "name": "AnalysisParcaTask"
    },
    {
        "fw_id": 19,
        "created_on": "2023-12-05T20:43:09.520117",
        "updated_on": "2023-12-05T20:43:09.520120",
        "name": "ScriptTask_compression_validation_data",
        "state": "WAITING"
    },
    {
        "fw_id": 20,
        "created_on": "2023-12-05T20:43:09.520013",
        "updated_on": "2023-12-05T21:05:50.347589",
        "state": "COMPLETED",
        "name": "InitValidationData"
    },
    {
        "fw_id": 21,
        "created_on": "2023-12-05T20:43:09.519919",
        "updated_on": "2023-12-05T21:15:37.365250",
        "state": "COMPLETED",
        "name": "ScriptTask_compression_validation_data_raw"
    },
    {
        "fw_id": 22,
        "created_on": "2023-12-05T20:43:09.519828",
        "updated_on": "2023-12-05T20:52:09.182471",
        "state": "COMPLETED",
        "name": "InitValidationDataRaw"
    },
    {
        "fw_id": 23,
        "created_on": "2023-12-05T20:43:09.519744",
        "updated_on": "2023-12-05T20:43:09.519746",
        "name": "ScriptTask_compression_sim_data",
        "state": "WAITING"
    },
    {
        "fw_id": 24,
        "created_on": "2023-12-05T20:43:09.519648",
        "updated_on": "2023-12-05T21:42:46.238209",
        "state": "COMPLETED",
        "name": "ScriptTask_compression_raw_data"
    },
    {
        "fw_id": 25,
        "created_on": "2023-12-05T20:43:09.519537",
        "updated_on": "2023-12-05T21:31:30.875954",
        "state": "COMPLETED",
        "name": "CalculateSimData"
    },
    {
        "fw_id": 26,
        "created_on": "2023-12-05T20:43:09.519369",
        "updated_on": "2023-12-05T20:52:14.081648",
        "state": "COMPLETED",
        "name": "InitRawData"
    }
]

with the error

Traceback (most recent call last):
  File "/home/users/rjuene/wcEcoli/wholecell/fireworks/firetasks/analysisBase.py", line 236, in run_plot
    plot_class.main(*args, cpus=1, analysis_paths=analysis_paths)
  File "/home/users/rjuene/wcEcoli/models/ecoli/analysis/analysisPlot.py", line 166, in main
    instance.plot(inputDir, plotOutDir, plotOutFileName, simDataFile,
  File "/home/users/rjuene/wcEcoli/models/ecoli/analysis/analysisPlot.py", line 156, in plot
    do_plot()
  File "/home/users/rjuene/wcEcoli/models/ecoli/analysis/analysisPlot.py", line 143, in do_plot
    self.do_plot(inputDir, plotOutDir, plotOutFileName, simDataFile,
  File "/home/users/rjuene/wcEcoli/models/ecoli/analysis/parca/fold_changes.py", line 20, in do_plot
    with open(os.path.join(input_dir, constants.SERIALIZED_RAW_DATA), 'rb') as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/home/users/rjuene/wcEcoli/out/20231205.124309__sherlock_internal_shift_metadata_test/kb/rawData.cPickle'

@ggsun and I suspect the issue is that the Parca output was compressed before the AnalysisParcaTask could run, creating the error in finding the needed file. Indeed, rawData.cPickle.bz2 is found in out/kb, but rawData.cPickle is not.

1fish2 commented 11 months ago

Good hypothesis! Indeed, fw_queue creates the task ScriptTask_compression_raw_data depending only on the completion of InitRawData and later adds a link for it to also depend on the InitValidationData task. This task runs bzip2. Checking in a local manual run, bzip2 replaced 15MB rawData.cPickle with 3.5MB rawData.cPickle.bz2.

(lpad get_fws has an option --display_format {all,more,less,ids,count,reservations}. Picking more or all would probably show the task dependency links to verify this expectation of the dependency links.)

So (going by the variables in fw_queue rather than the task names) fw_parca_analysis should be another "parent" (dependency, prerequisite) of the fw_raw_data_compression task.

The code is almost there.

https://github.com/CovertLab/wcEcoli/blob/576d3b387dc82e958467c7a612a444ee1a74382d/runscripts/fireworks/fw_queue.py#L632-L633

^^^ This makes fw_parca_analysis a parent of fw_sim_data_1_compression and fw_validation_data_compression, that is, don't run those two compression tasks until fw_parca_analysis completes.

Just add fw_raw_data_compression as another arg to add_links().

If this symptom is currently reproducible (the compression task could happen to run late enough sometimes to avoid the symptom), it's a good time to test the fix.

This raises other questions: