KWARC / rust-libxml

Rust wrapper for libxml2
https://crates.io/crates/libxml
MIT License
76 stars 38 forks source link

Looking for wellformed check #119

Open MatthD opened 1 year ago

MatthD commented 1 year ago

Hello, I am the creator or node-libxml, I would like to based my lib on your's instead of the C implementation. I am facing difficulty to try to perform wellformed check

main.rs

use libxml::{tree::Document, parser::XmlParseError};

fn main() {
    let parser = libxml::parser::Parser::default(); 
    let xml_file = parser.parse_file("tests/data/test-not-wellformed.xml");
    // let root_name = xml_file.unwrap().get_root_element().unwrap().get_name();
    dbg!(is_wellformed(xml_file.unwrap().get_root_element()));
}

fn is_wellformed(doc: Result<Document, XmlParseError>)-> bool{
    match doc {
        Err(_error) => {
            false
        },
        Ok(_doc) => {
            true
        },
    }
}

tests/data/test-not-wellformed.xml


<!DOCTYPE article PUBLIC "my doctype of doom" "mydoctype.dtd">
<xpath>
    <to>
        <my>
            <infos>trezaq</infos>
    </to>
</xpath>
``

return me true, should return me false because it's not wellformed.

Furthermore I would need DTD & XSD validation and path parsing but I suppose I will need other libraries or contribute to your ;) 
dginev commented 1 year ago

Hi @MatthD ,

I see you wrote:

I would like to based my lib on your's instead of the C implementation.

just a word of warning - the current rust-libxml crate depends on having the C headers installed, and is a thin wrapper over them in Rust.

It is not a full Rust reimplementation of libxml2. I am tracking the c2rust project's port of libxml2 to see if one day we could indeed be fully Rust-native, but that is some ways away still.


As to checking XML well-formedness, it is possible that we have not yet fleshed that out in the wrapper layer. There is a dedicated method in Parser called is_well_formed_html which seems to be doing a decent job at this for HTML, and we may want to extend/generalize Parser to also support XML well-formedness checks.

Similarly, the Parser parse_file method will currently always return error-free, as long as it managed to obtain a Document from the underlying libxml2 layer. So the error check in your snippet will only catch cases where no document could be constructed.