SoarGroup / Soar

Soar, a general cognitive architecture for systems that exhibit intelligent behavior.
http://soar.eecs.umich.edu
Other
321 stars 70 forks source link

SML: Segfault when accessing null pointer without error message #451

Open moschmdt opened 1 month ago

moschmdt commented 1 month ago

This is not a bug, rather more a quality of live improvement suggestion

When reading the output link via SML (FindAttribute) and one attribute does not exist results in a returned null pointer. Reading the null pointer obviously results in a segfault.

One could check for null pointers every time but code readability suffers in my opinion while prototyping and debugging with external code could be associated with high efforts.

I would suggest throwing an error naming the parent I’d that is still valid and the invalid child parameter instead returning a null pointer.

While the Adaption in the cpp code is fairly simple, I don’t know the consequences for the SWIG interfaces. Someone has an idea about it and can estimate if this is feasible?

garfieldnate commented 1 month ago

Sorry could you provide an example piece of code that triggers the segfault?

moschmdt commented 1 month ago

Example Soar rule to generate the output link:

sp {any*propose*output
   (state <s> ^io.output-link <ol>)
   (<ol> -^hello world)
-->
   (<s> ^operator <o> +)
   (<o> ^name output_link)
}

sp {any*apply*output
   (state <s> ^operator <o>
      ^io.output-link <ol>)
   (<o> ^name output_link)
-->
   (<ol> ^hello world)
   (write |Hello|)
}

CPP SML code to read the output link.

#include <iostream>
#include "sml_Client.h"

using namespace std;
using namespace sml;

void MyPrintEventHandler(smlPrintEventId id, void *pUserData, Agent *pAgent, char const *pMessage)
{
    // In this case the user data is a string we're building up
    std::string *pTrace = (std::string *)pUserData;

    (*pTrace) += pMessage;
}

int main(int argc, char const *argv[])
{
    Kernel *pKernel = Kernel::CreateKernelInNewThread();
    sml::Agent *pAgent = pKernel->CreateAgent("test");
    pAgent->LoadProductions("main.soar");

    pAgent->RunSelf(1);

    // Works as expected.
    pAgent->GetOutputLink()->GetParameterValue("hello"); 

    // This results in a segfault as expected, but no hint which read operation on the 
    // output link actually caused the segfault. 
    pAgent->GetOutputLink()->GetParameterValue("world"); 

    string dummy;
    cin >> dummy;
    return 0;
}

Potential update to improve the situation in sml_ClientIdentifier.h:

#include <stdexcept>

...

 char const* GetParameterValue(char const* pAttribute) const
{
    WMElement* pWME = FindByAttribute(pAttribute, 0) ;
    if (pWME == NULL){
          std::string err_msg = "[Error]: " + this->m_AttributeName + " has no child attribute: " + std::string(pAttribute);
          throw std::invalid_argument(err_msg);
    }
    return pWME->GetValueAsString();
}

A more subtle change would be to replace the throw with a print command, e.g. cout and do not throw an error to remain API compatible, checking for null pointer returns.

ShadowJonathan commented 1 month ago

Doing throws would be more semantic imo, and writing to cout could be dangerous/bad in some scenarios since the SML bindings can be embedded in other programs

moschmdt commented 1 month ago

If this is actually desired I would be willing to create a PR. One important remark: this could be considered a breaking API change since checking for returned null pointers no longer works and would require replacing with a try-catch structure.

Maybe someone with experience with SWIG could comment on the effects for the other language interfaces.

moschmdt commented 1 month ago

There are also null pointer returns in WMElement conversions to identifier, string, int and float elements without error messages, which could also be updated in case the conversion is not implemented in inherited classes.

https://github.com/SoarGroup/Soar/blob/0c644c3f630bdf2aa8a53d02cba84081e9d05447/Core/ClientSML/src/sml_ClientWMElement.h#L121-L124

garfieldnate commented 1 month ago

Okay, thanks for opening this! It does seem like a serious usability issue, so it would be nice to address it.

Returning null and printing to cerr would be very easy; we already print to cerr throughout Soar, so I don't think that would be a new consideration for clients. Clients could still get a null pointer exception, but perhaps the source of it would be clearer.

I agree with Jonathan, though: I think that throwing an exception would be the clearest way to handle this. We don't use throw anywhere in the entire codebase except for the unit tests, though, which means that we've never seen what SWIG turns C++ exceptions into in the languages we have bindings for: Java, Python, Tcl and C#. Therefore, I think it would be best to test this once to confirm that the actual effect is acceptable (I'm sure it'd be better than a segfault, but what would the message be, and what object would the exception be?).

We could add one try-catch test to each of the SML tests to demonstrate the behavior; we currently run Java, Python and Tcl tests in CI already, but haven't been able to add a C# one yet.

this could be considered a breaking API change since checking for returned null pointers no longer works and would require replacing with a try-catch structure.

I don't follow. Existing code that checks for null would look like this, right?

    WMElement* pWME = FindByAttribute(pAttribute, 0) ;
    if (pWME == NULL){
        // handle here
    }
    const char* val = pWME->GetValueAsString();
    ...

Wouldn't this continue to work fine? It doesn't call GetParameterValue, so it would never hit an exception. Or were you thinking of a different current use-case?

moschmdt commented 1 month ago

Yes, that could continue to work as long as the FindByAttribute is not changed as well in favour of a throw.

The follow up question would be if the FindByAttribute function should be changed instead of the GetParameterValue to catch all SML calls?

Error Messages

As far as error messages are concerned, I would expect to get something like this for the previous example:

ERROR: output-link has no child attribute world.

This includes information about the last valid WME as well as the child element that could not be found. What other information could be valuable?

SWIG

A quick search on SWIG and CPP exceptions:

garfieldnate commented 1 month ago

FindByAttribute has been documented since forever as returning NULL when the wme is not found, and I think that would be too big of a breaking change at this point, so I would rather address the issue in the functions that call it.

Regarding your note on ConvertToIdentifier: I can't actually find where that thing is implemented 💢 If you see what's happening there, please share a pointer with me.

I've fixed the main issue described in the thread and will open a PR shortly.

garfieldnate commented 1 month ago

@moschmdt Would you mind taking a look? https://github.com/SoarGroup/Soar/pull/465. Thanks!

moschmdt commented 1 month ago

It looks good to me. Thank you for implementing!