JeffersonLab / hcana

Hall C++ Analyzer
7 stars 118 forks source link

hcana error #485

Closed Yero1990 closed 1 year ago

Yero1990 commented 1 year ago

When I switch to hcana "develop" branch and replayed data, I got a seg. fault error (see attached) However, when I revert back to hcana "firmware_update" branch, the data replay works fine. I don't know if it may have to do with my own version, or has anyone also observed this issue? hcana_err

MarkKJones commented 1 year ago

Did you update PODD? The develop branch is using a different version of PODD.

Yero1990 commented 1 year ago

Yes. I did:

git submodule init git submodule sync git submodule update

and then:

scons -j10

I will try it again, just to make sure.

On Wed, Aug 17, 2022 at 12:25 PM Mark K Jones @.***> wrote:

Did you update PODD? The develop branch is using a different version of PODD.

— Reply to this email directly, view it on GitHub https://github.com/JeffersonLab/hcana/issues/485#issuecomment-1218241780, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEAPX2XLG4ANRI7DOODQ7TLVZUHBNANCNFSM562EYYHQ . You are receiving this because you authored the thread.Message ID: @.***>

hansenjo commented 1 year ago

Hmmm. The best way to track this down would be to run hcana in the debugger. Do you have the replay setup (script, database, raw data file) somewhere?

Ole

On 8/17/22 18:41, Carlos Yero wrote:

Yes. I did:

git submodule init git submodule sync git submodule update

and then:

scons -j10

I will try it again, just to make sure.

On Wed, Aug 17, 2022 at 12:25 PM Mark K Jones @.***> wrote:

Did you update PODD? The develop branch is using a different version of PODD.

— Reply to this email directly, view it on GitHub

https://github.com/JeffersonLab/hcana/issues/485#issuecomment-1218241780, or unsubscribe

https://github.com/notifications/unsubscribe-auth/AEAPX2XLG4ANRI7DOODQ7TLVZUHBNANCNFSM562EYYHQ . You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_JeffersonLab_hcana_issues_485-23issuecomment-2D1218257897&d=DwMFaQ&c=CJqEzB1piLOyyvZjb8YUQw&r=FxpSg1qAggzMg8YCjcvm4w&m=CPqR-4d-WhZYYG9tm58dO9h_7zp9r7DGTLrLtSNBAkBi7xu92tcSQv6vU1If3Uhq&s=LVVPBfgm1fGSvEJpda0NQlIobbpLxOnP10EdmDyKFB8&e=, or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ABKGDZLMNUEYIUH7WV6KTVLVZUI23ANCNFSM562EYYHQ&d=DwMFaQ&c=CJqEzB1piLOyyvZjb8YUQw&r=FxpSg1qAggzMg8YCjcvm4w&m=CPqR-4d-WhZYYG9tm58dO9h_7zp9r7DGTLrLtSNBAkBi7xu92tcSQv6vU1If3Uhq&s=6wQDJ0kARMSp30fagp2tx_Tu6a35p7U10n9_W6Wpa2E&e=. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Yero1990 commented 1 year ago

Yes, the replay is set-up in the Hall C Counting Room cdaq machines.

For example:

Assuming you are on a desktop [hcdesk] machine in the Counting

House, open a terminal and login to cdaq $ ssh cdaql1

source a working version of ROOT cern (unfortunately this is done

each time a new session is started) $ source /apps/root/6.22.08/setroot_CUE.csh

change to the official cafe directory

$ cd cafe-2022/cafe_online_replay

Then run: ./hcana SCRIPTS/COIN/PRODUCTION/replay_cafe.C

I would have to revert back to the hcana "develop" branch before you try the debugger, so you can reproduce the error. Let me know.

On Thu, Aug 18, 2022 at 8:23 AM Ole Hansen @.***> wrote:

Hmmm. The best way to track this down would be to run hcana in the debugger. Do you have the replay setup (script, database, raw data file) somewhere?

Ole

On 8/17/22 18:41, Carlos Yero wrote:

Yes. I did:

git submodule init git submodule sync git submodule update

and then:

scons -j10

I will try it again, just to make sure.

On Wed, Aug 17, 2022 at 12:25 PM Mark K Jones @.***> wrote:

Did you update PODD? The develop branch is using a different version of PODD.

— Reply to this email directly, view it on GitHub

< https://github.com/JeffersonLab/hcana/issues/485#issuecomment-1218241780>, or unsubscribe

< https://github.com/notifications/unsubscribe-auth/AEAPX2XLG4ANRI7DOODQ7TLVZUHBNANCNFSM562EYYHQ

. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub < https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_JeffersonLab_hcana_issues_485-23issuecomment-2D1218257897&d=DwMFaQ&c=CJqEzB1piLOyyvZjb8YUQw&r=FxpSg1qAggzMg8YCjcvm4w&m=CPqR-4d-WhZYYG9tm58dO9h_7zp9r7DGTLrLtSNBAkBi7xu92tcSQv6vU1If3Uhq&s=LVVPBfgm1fGSvEJpda0NQlIobbpLxOnP10EdmDyKFB8&e=>,

or unsubscribe < https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ABKGDZLMNUEYIUH7WV6KTVLVZUI23ANCNFSM562EYYHQ&d=DwMFaQ&c=CJqEzB1piLOyyvZjb8YUQw&r=FxpSg1qAggzMg8YCjcvm4w&m=CPqR-4d-WhZYYG9tm58dO9h_7zp9r7DGTLrLtSNBAkBi7xu92tcSQv6vU1If3Uhq&s=6wQDJ0kARMSp30fagp2tx_Tu6a35p7U10n9_W6Wpa2E&e= . You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/JeffersonLab/hcana/issues/485#issuecomment-1219426544, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEAPX2QZMPU6UKNJYPXGCDTVZYTLJANCNFSM562EYYHQ . You are receiving this because you authored the thread.Message ID: @.***>

hansenjo commented 1 year ago

Sorry for the delay. I'll have time to look at this by the end of the week.

Yero1990 commented 1 year ago

Ok no problem Ole, thanks.

If you will be working on hcana on a cdaq machine, you can just copy my version of the repo to your workspace:

My repo is located at: /home/cdaq/cafe-2022/hcana I'm currently using the "firmware_update" branch, recall the issue I had was with the "develop" branch

On Tue, Aug 23, 2022 at 9:18 PM Ole Hansen @.***> wrote:

Sorry for the delay. I'll have time to look at by the end of the week.

— Reply to this email directly, view it on GitHub https://github.com/JeffersonLab/hcana/issues/485#issuecomment-1225059187, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEAPX2SHOQRFPO2J7IJM5DLV2VZ5NANCNFSM562EYYHQ . You are receiving this because you authored the thread.Message ID: @.***>

MarkKJones commented 1 year ago

I looked a bit into this. The TIBlobModule Init function was not being called. Not sure why this has changed though in header it has "using PipeliningModule::Init;"

I force it to call Init in the LoadSlot . This fixed the error but now there is an warning in THcHitList.cxx when it compares the FADC modules trigger time to the TI trigger time.

hansenjo commented 1 year ago

I suppose the TIBlobModule is default-constructed somewhere, and it was forgotten to set fNumChan in the default constructor. Let me verify.

MarkKJones commented 1 year ago

I am confused about how TIBlobModule is initialized. As far as I can see the Init is called in the contructor. We do not have it called in our scripts. We are mainly using it to get the TI trigger time to compare to the individual FADC trigger times in the crate. Is there a better way?

sawjlab commented 1 year ago

TIBlobModule is a Decoder class. So it should get constructed just like all the other Decoder classes as long as each TI is listed in db_cratemap.dat.

hansenjo commented 1 year ago

Init() is called from the regular constructor, but not the default constructor, which may be the problem, but I'm not sure yet. (The statement using PipeliningModule::Init is not a problem.) I guess it's ok to have a dedicated module for extracting trigger times. At least I can't think of a superior solution off the top of my head.

The TIBlobModule is not very well written though; I'll send some fixes soon that will make it more resilient to crashes. I do want to verify Carlos's report first. I need to copy his replay over to my home machine, so give me a little while.

hansenjo commented 1 year ago

@sawjlab Right. All the decoder modules are instantiated in THaSlotData::loadModule(), specifically line 144, which calls TClass::New(), which calls the default constructor. The module is then configured later in that routine through calls to SetSlot() and SetBank(). Apply this sequence to TIBlobModule, and one ends up with fNumChan = 0 (set in the Module::Module() default constructor). And TIBlobModule::LoadSlot makes the hardcoded assumption that fNumChan = 3. Crash.

I don't know for sure yet why this worked before, but I did streamline the decoder not too long ago, removing redundant calls to things like Init() and Clear(), exposing a logic flaw in TIBlobModule. Easy to fix.

hansenjo commented 1 year ago

I got to the bottom of the reason for the crash. It came down to a typo in Podd's PipeliningModule::Init function. I pushed a fix up to Podd's Release-170_patches branch. And I have a few more touchups for hcana, which I'll turn into a PR soon.

Meanwhile, I am puzzled as to the now-copious warnings about trigger time differences ("Big ADC Trigger Time Shift"). Investigating ...

hansenjo commented 1 year ago

Good news: Problem found. There was another typo, this time in TIBlobModule (forgot to rename a variable in commit 5183dbd2). All seems fine now. Preparing PR.

hansenjo commented 1 year ago

Resolved with commit 8a11eb22