caracal-pipeline / crystalball

Distributed prediction of visibilities from a sky model
GNU General Public License v2.0
2 stars 5 forks source link

Separator is not found, and chunk exceed the limit #52

Closed Kincaidr closed 1 year ago

Kincaidr commented 2 years ago

I get this error when runnning dd predict:

# 2022-05-26 10:02:19 | INFO     | wsclean:import_from_wsclean | 999: POINT 5.59369447529452e-06 Jy
# 2022-05-26 10:02:19 | INFO     | wsclean:import_from_wsclean | Total flux of 1000 selected components is 0.319994 Jy
# 2022-05-26 10:02:19 | INFO     | budget:get_budget | --------------------------------------------------
# 2022-05-26 10:02:19 | INFO     | budget:get_budget | Budgeting
# 2022-05-26 10:02:19 | INFO     | budget:get_budget | --------------------------------------------------
# 2022-05-26 10:02:19 | INFO     | budget:get_budget | system RAM = 503.81 GB
# 2022-05-26 10:02:19 | INFO     | budget:get_budget | nr of logical CPUs = 56
# 2022-05-26 10:02:19 | INFO     | budget:get_budget | nr sources = 1000
# 2022-05-26 10:02:19 | INFO     | budget:get_budget | nr rows    = 7120638
# 2022-05-26 10:02:19 | INFO     | budget:get_budget | nr chans   = 1024
# 2022-05-26 10:02:19 | INFO     | budget:get_budget | nr corrs   = 4
# 2022-05-26 10:02:19 | INFO     | budget:get_budget | sources per chunk = 235 (auto settings)
# 2022-05-26 10:02:19 | INFO     | budget:get_budget | rows per chunk    = 23583 (auto settings)
# 2022-05-26 10:02:19 | INFO     | budget:get_budget | expected memory usage = 50.38 GB
# 2022-05-26 10:02:22 | INFO     | crystalball:_predict | Field J2009-2026 DDID 0 rows 7120638 chans 1024 corrs 4
# Successful read/write open of default-locked table J2009_2026-corr.ms: 32 columns, 7120638 rows
2022-05-26 13:03:29 STIMELA.boom.dd_predict ERROR: /home/kincaid/.local/bin/crystalball threw exception: Separator is not found, and chunk exceed the limit after 3:01:20'

full log file:

log-boom.dd_predict.txt

bennahugo commented 2 years ago

I want to disentangle what is going on. Looking for this online it looks like a pipe buffer overflow which is likely stemming from the stimela logging. Could you run this step outside of stimela to see if this is the case?

Kincaidr commented 2 years ago

It works outside of stimela

2022-06-02 23:40:24 | INFO     | wsclean:import_from_wsclean | 2226: POINT -0.000640304616590827 Jy
2022-06-02 23:40:24 | INFO     | wsclean:import_from_wsclean | 2227: POINT -0.000951130605929632 Jy
2022-06-02 23:40:24 | INFO     | wsclean:import_from_wsclean | Total flux of 2228 selected components is 0.303093 Jy
Successful read/write open of default-locked table J2009_2026-corr.ms: 32 columns, 7120638 rows
2022-06-02 23:40:25 | INFO     | budget:get_budget | --------------------------------------------------
2022-06-02 23:40:25 | INFO     | budget:get_budget | Budgeting
2022-06-02 23:40:25 | INFO     | budget:get_budget | --------------------------------------------------
2022-06-02 23:40:25 | INFO     | budget:get_budget | system RAM = 503.81 GB
2022-06-02 23:40:25 | INFO     | budget:get_budget | nr of logical CPUs = 56
2022-06-02 23:40:25 | INFO     | budget:get_budget | nr sources = 2228
2022-06-02 23:40:25 | INFO     | budget:get_budget | nr rows    = 7120638
2022-06-02 23:40:25 | INFO     | budget:get_budget | nr chans   = 1024
2022-06-02 23:40:25 | INFO     | budget:get_budget | nr corrs   = 4
2022-06-02 23:40:25 | INFO     | budget:get_budget | sources per chunk = 235 (auto settings)
2022-06-02 23:40:25 | INFO     | budget:get_budget | rows per chunk    = 23583 (auto settings)
2022-06-02 23:40:25 | INFO     | budget:get_budget | expected memory usage = 50.38 GB
2022-06-02 23:40:28 | INFO     | crystalball:_predict | Field J2009-2026 DDID 0 rows 7120638 chans 1024 corrs 4
[######################################### ] | 99% Complete (Estimate) | 12h 1m / ~12h 1m(crystalbal) kincaid@ike:/net/ike/vault-ike/kincaid/jovial_run$
o-smirnov commented 1 year ago

@Kincaidr has the problem been spotted since https://github.com/caracal-pipeline/stimela/issues/137? I think the fix is in master now.

o-smirnov commented 1 year ago

@sjperkins it would be good to add a --boring option to suppress the progress bar, in a pipeline context it produces too much junk on stdout. Or is there a way to ask it for a simplified bar that simply prints a percentage to the console every now and then?

Kincaidr commented 1 year ago

Yes this problem was solved a while back