POETSII / Orchestrator

The Orchestrator is the configuration and run-time management system for POETS platforms.
1 stars 1 forks source link

Segfault when task /build without link /link #93

Closed heliosfa closed 5 years ago

heliosfa commented 5 years ago

In Development, I can create a reproducible segfault by executing the following command sequence:

task /path = "/home/gmb/Orchestrator/application_staging/xml"
task /load = plate_100x100.xml
task /build = plate_100x100

(note the lack of topology /set1 and link /link = plate_100x100)

gmb@heaney:~/Orchestrator$ ./orchestrate.sh
Attach debugger to Root process 0 (0).....

POETS>
POETS>task /path = "/home/gmb/Orchestrator/application_staging/xml"
task /load = plate_100x100.xmlPOETS> 13:41:45.17:  23(I) task /path = "/home/gmb/Orchestrator/application_staging/xml"
POETS> 13:41:45.17: 102(I) Task graph default file path is || ||
POETS> 13:41:45.17: 103(I) New path is ||/home/gmb/Orchestrator/application_staging/xml/||
POETS>
POETS> 13:41:52.54:  23(I) task /load = plate_100x100.xml
POETS>task /build = plate_100x100

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 27217 RUNNING AT heaney
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
gmb@heaney:~/Orchestrator$

Clearly this should NOT happen and we should have a graceful error instead.

AlexRast commented 5 years ago

Just started looking into this. As far as I can tell this appears to be an issue introduced with some revision of the hardware model. The offending function is P_builder::Preplace, which falls over at the line if (!par->pPlace->Place(task)) task->LinkFlag(); after building an engine "VirtualSystem". Not had time to look at it further, but it appears the engine is not being built correctly, because gdb says placement is hitting a bad iterator trying to iterate through the threads - in the line in Placement::Place pTh = iterator->next_thread();

The Preplace function was originally designed to make sure that if you've not defined a topology or placed anything yet, we automatically create a 'default' topology and map to it so that something can be sensibly built. Preplace was working at about the time of the workshop IIRC. (It was definitely working at some point near that time frame).

I am adding Mark as an assignee for obvious reasons...and will continue digging myself.

AlexRast commented 5 years ago

Fixed in commit 853ecc0 on the bugfix93 branch. The issue was simply that Placement now includes an Init() method that initialises an iterator into the engine associated with its parent. Within P_Builder, meanwhile, when a task with no existing engine (i.e. no topology) was found it was creating a new engine and associating it with the parent Orchestrator, but forgot to initialise the parent's Placement object.

Review if you wish but I suspect this particular fix is likely to be non-contentious; 1 line in one place that isn't going to have further ramifications.