JeffersonLab / halld_recon

Reconstruction for the GlueX Detector
7 stars 9 forks source link

crash in monitoring for PrimeX #116

Closed aaust closed 5 years ago

aaust commented 5 years ago

The latest monitoring runs all crashed with the same error:

JANA >>Reading CCAL profile data from /group/halld/www/halldweb/html/resources//CCAL/profile_data/ccal_profile_data_v1.dat ...
At line 27 of file libraries/CCAL/island.F (unit = 22, file = '�')
Fortran runtime error: File '/group/halld/www/halldweb/html/resources//CCAL/profile_data/ccal_profile_data_v1.dat^' does not exist

I reproduced the error with the file /cache/halld/RunPeriod-2019-01/rawdata/Run061260/hd_rawdata_061260_000.evio

sdobbs commented 5 years ago

Could you please point to a log file? This should be loaded as a resource...

On Mon, Feb 25, 2019 at 10:19 AM Alex Austregesilo notifications@github.com<mailto:notifications@github.com> wrote:

The latest monitoring runs all crashed with the same error:

JANA >>Reading CCAL profile data from /group/halld/www/halldweb/html/resources//CCAL/profile_data/ccal_profile_data_v1.dat ...

At line 27 of file libraries/CCAL/island.F (unit = 22, file = '�')

Fortran runtime error: File '/group/halld/www/halldweb/html/resources//CCAL/profile_data/ccal_profile_data_v1.dat^' does not exist

I reproduced the error with the file /cache/halld/RunPeriod-2019-01/rawdata/Run061260/hd_rawdata_061260_000.evio

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/JeffersonLab/halld_recon/issues/116, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABIJaj3dXLbtSr4rUfPQ9MRKZ6vaceAuks5vQ_8FgaJpZM4bQG59.

aaust commented 5 years ago

/work/halld2/data_monitoring/RunPeriod-2019-01/mon_ver01/log/061260/stdout.061260_000.out /work/halld2/data_monitoring/RunPeriod-2019-01/mon_ver01/log/061260/stderr.061260_000.err

sdobbs commented 5 years ago

yeah this is a memory error - a fix exists, I'll get it

sdobbs commented 5 years ago

should be fixed by #118

aaust commented 5 years ago

sorry, it isn't

sdobbs commented 5 years ago

119 should fix it. there are a bunch of warning messages which need to be limited, but we can run at least

aaust commented 5 years ago

There is a massive amount of warning messages, and then it still crashes

sdobbs commented 5 years ago

OK, I commented out the worst warning and cleared things up. This works for some other files, so I would go ahead and use this for now.

Keep the issue open, we are working to get rid of global variables and other nasty memory management issues...

aaust commented 5 years ago

Ok thanks, I don't see a crash with the file mentioned above. Just a few messages like:

WRN bad cluster log. coord, center id = 85 322.500000