KWARC / rust-libxml

Rust wrapper for libxml2
https://crates.io/crates/libxml
MIT License
73 stars 38 forks source link

Crossing the boundary with context data for error handling callbacks in libxml2 #63

Closed lweberk closed 4 years ago

lweberk commented 4 years ago

Hi,

I'm currently wrapping XSD functionality and banging my head against the table while attempting to capture structured errors from the parser and later validation contexts.

Does any one of you have experience in passing a pointer to a Vec across the FFI together with the callback, so the callback can then on call by libxml2 stuff the error in there, allowing for the parser or validation context wrappers on the other side to return them in a structured fashion?

Callback context passed through the FFI;

impl SchemaParserContext
{
    fn from_raw(parser: *mut bindings::_xmlSchemaParserCtxt) -> Self
    {
        let errors = Rc::new(RefCell::new(Vec::new()));

        unsafe {
            bindings::xmlSchemaSetParserStructuredErrors(
                parser,
                Some(common::structured_error_handler),
                Box::into_raw(Box::new(errors.clone())) as *mut _
            );
        }

        Self {inner: parser, errlog: errors}
    }
}

Callback passed through the FFI;

/// Provides a callback to the C side of things to accumulate xmlErrors to be
/// handled back on the Rust side.
pub fn structured_error_handler(ctx: *mut c_void, error: bindings::xmlErrorPtr)
{
    let errlog = unsafe {
        Box::from_raw(ctx as *mut Rc<RefCell<Vec<XmlStructuredError>>>)
    };

    let error = XmlStructuredError::from_raw(error);

    errlog.borrow_mut()
        .push(error);
}

The issue with this is that the Rc increments only once while settings the callback with libxml2. But on each error, it decrements in the callback. So it essentially frees the Vec away. I'm at loss on what else to use.

Any ideas?

dginev commented 4 years ago

First, let me say I don't have that much experience, or know the best practice, for passing around callbacks with FFI.

But what I could advise is that you downgrade your Rc so that it does not deallocate too quickly on either side, which is a problem that has already popped up in deciding how to prep the Rust structs wrapping the libxml2 ones.

if you kept the primary Rc<RefCell<Vec<XmlStructuredError>>> value attached to some of your Rust data (e.g. as a field in SchemaParserContext, or a parent object that has a longer duration), then you can call Rc::downgrade() on it and pass the resulting Weak<RefCell<Vec<XmlStructuredError>>> to xmlSchemaSetParserStructuredErrors.

When the Weak later gets used and dropped by structured_error_handler, the underlying Rc remains allocated until its owner is itself dropped, etc. Here's a reference if Weak sounds like news:

https://doc.rust-lang.org/std/rc/struct.Weak.html

lweberk commented 4 years ago

Thank you a lot. I had not really understood the workings of Weak until you told me to look at it.

Turns out, that it fixed an issue but not the one that was breaking the code. While creating a heap pointer with Box and passing it into the boundary with Box::into_raw, while recreating the Box at the other end with Box::from_raw, it takes ownership over the passed opaque pointer. When libxml2 proceeds to call the error handling callback a second time, it turns out that while dropping out of the scope during the last call the Box already invalidated the pointer. Thus the need of having the Box leak its pointer with Box::leak.

Similar thing with the Rc within the Box, which is fixed by using Weak as you suggested.

pub fn structured_error_handler(ctx: *mut c_void, error: bindings::xmlErrorPtr)
{
    let errlog = unsafe {
        // passing a Weak counter on the Rc owned by context wrapper structure on the other side of the FFI
        Box::from_raw(ctx as *mut Weak<RefCell<Vec<XmlStructuredError>>>)  
    };

    let error = XmlStructuredError::from_raw(error);

    if let Some(errors) = errlog.upgrade()
    {
        errors.borrow_mut()
            .push(error);
    } else {
        panic!("Underlying error log should not have outlived callback registration");
    }

    // Leak the contents otherwise it invalidates the *mut c_void held by libxml2
    Box::leak(errlog);
}

Off to Valgrind the whole thing and PR.

Thank you very much!

dginev commented 4 years ago

Great, thanks for sharing the details around Box::leak, that is something I haven't read about myself, and will do so when reviewing the PR. Cool that it's looking promising!