bradbell / at_cascade

Cascading Dismod_at Analysis From Parent To Child Regions
https://at-cascade.readthedocs.io
4 stars 3 forks source link

csv.predict: pre_user_csv.py: Cannot find a file #13

Open bradbell opened 9 months ago

bradbell commented 9 months ago

at_cascade-2024.1.30: It appears that sometimes one of the prediction jobs does not create its output files and pre_user_csv crashes when it tries to use those files. I have seen this twice and decided to report it the second time.

In the case below, there is a begin for 124_Central_Latin_Am.female but there is no end notice for this job.

Predict: n_predict = 7, n_spawn = 3
Begin: 20:08:57: predict 1_Earth.both
Begin: 20:08:57: predict 103_Latin_Am_Caribbean.female
Begin: 20:08:57: predict 103_Latin_Am_Caribbean.male
Begin: 20:08:57: predict 124_Central_Latin_Am.female
End:   01:46:31: predict 103_Latin_Am_Caribbean.female 1/7
Begin: 01:46:31: predict 124_Central_Latin_Am.male
End:   01:46:37: predict 103_Latin_Am_Caribbean.male 2/7
Begin: 01:46:37: predict 130_Mexico.female
End:   01:46:50: predict 1_Earth.both 3/7
Begin: 01:46:50: predict 130_Mexico.male
End:   02:31:41: predict 124_Central_Latin_Am.male 4/7
End:   02:32:51: predict 130_Mexico.male 5/7
End:   02:33:08: predict 130_Mexico.female 6/7
Traceback (most recent call last):
  File "/home/bradbell/trash/./run_cascade.py", line 18, in <module>
    at_cascade.csv.predict(fit_dir, sim_dir, start_job_name, max_node_depth)
  File "/home/bradbell/.local/lib/python3.12/site-packages/at_cascade/csv/predict.py", line 446, in predict
    at_cascade.csv.pre_parallel(
  File "/home/bradbell/.local/lib/python3.12/site-packages/at_cascade/csv/pre_parallel.py", line 285, in pre_parallel
    at_cascade.csv.pre_user_csv(
  File "/home/bradbell/.local/lib/python3.12/site-packages/at_cascade/csv/pre_user_csv.py", line 157, in pre_user_csv
    assert os.path.isfile(file_name)
AssertionError
bradbell commented 9 months ago

I re-ran the call to csv.predict for the case above and the error did not reproduce. To be specific, I got

Predict: n_predict = 7, n_spawn = 3
Begin: 04:34:28: predict 1_Earth.both
Begin: 04:34:28: predict 103_Latin_Am_Caribbean.female
Begin: 04:34:28: predict 103_Latin_Am_Caribbean.male
Begin: 04:34:28: predict 124_Central_Latin_Am.female
End:   05:33:06: predict 103_Latin_Am_Caribbean.female 1/7
Begin: 05:33:06: predict 124_Central_Latin_Am.male
End:   05:33:07: predict 103_Latin_Am_Caribbean.male 2/7
Begin: 05:33:07: predict 130_Mexico.female
End:   05:33:22: predict 1_Earth.both 3/7
Begin: 05:33:22: predict 130_Mexico.male
End:   05:33:28: predict 124_Central_Latin_Am.female 4/7
End:   06:19:57: predict 124_Central_Latin_Am.male 5/7
End:   06:21:07: predict 130_Mexico.female 6/7
End:   06:21:11: predict 130_Mexico.male 7/7
remove: diabetes_fpg_cv20_5cv_asymp_N_pre_1_Earth.both shared memory
bradbell commented 9 months ago

The following error reporting and recovery was added to at_cascade to reduce the effect of this problem and maybe figure out when and why it is happening: https://github.com/bradbell/at_cascade/commit/8a42a42aebd47e0b3592f365e2f5aa44acd8cc7e This changes the assert above to

 print( f'csv.predict: Cannot find {file_name}' )

If you see this message please report it below. (In this case the predictions for the corresponding job will be missing).