KWARC / rust-libxml

Rust wrapper for libxml2
https://crates.io/crates/libxml
MIT License
76 stars 38 forks source link

Implement Send for Document #123

Open zippy2 opened 1 year ago

zippy2 commented 1 year ago

Document is declared as follows:

pub(crate) struct _Document {
  /// pointer to a libxml document
  pub(crate) doc_ptr: xmlDocPtr,
  /// hashed pointer-to-Node bookkeeping table
  nodes: HashMap<xmlNodePtr, Node>,
}

Now, looking into libxml sources, neither struct _xmlDoc nor struct _xmlNode contain any thread local data. Therefore, it should be possible for the Document to have Send trait.

dginev commented 1 year ago

You're right for a single document object in isolation, but I worry.

Consider a program that builds a document, then extracts some nodes from it for manipulation, then hands off the document to another thread (via Send) while keeping the nodes in the parent thread. It is very easy to cause a race condition - mutating a Node in the parent thread and the same node in the child thread. For Document to be Send I would expect all of its pieces to be sent along with it (or ensure they are safe to live in all threads, via Sync).

I had tried my hand at a Sync interface back in #46 but didn't get too far. But I'm open to ideas.

dginev commented 1 year ago

If you don't need mutability, I could imagine extending the read-only treatment for nodes, which added a thread-safe RoNode struct mirroring Node also for document. So a new RoDocument struct, which is thread-safe, but disallows mutating anything inside the document (so not too useful beyond getting the root node as a RoNode and passing it on - which you can do already).

zippy2 commented 1 year ago

. It is very easy to cause a race condition - mutating a Node in the parent thread and the same node in the child thread.

Correct. And that's exactly the distinction between Sync and Send. My use case is a bit different: I want to have a "cache" of parsed (immutable) XML documents (basically a HashMap<Document>) and to allow multiple threads run XPATH queries I need Sync (solvable by adding a mutex), and Send. Right now, the cache stores str and each thread parses it again and again.

For Document to be Send I would expect all of its pieces to be sent along with it (or ensure they are safe to live in all threads, via Sync).

Agreed. But Sync can be supplied by lib users via Mutex.

I had tried my hand at a Sync interface back in #46 but didn't get too far. But I'm open to ideas.

Agreed. Native Sync is definitely out of question.

If you don't need mutability, I could imagine extending the read-only treatment for nodes, which added a thread-safe RoNode struct mirroring Node also for document. So a new RoDocument struct, which is thread-safe, but disallows mutating anything inside the document (so not too useful beyond getting the root node as a RoNode and passing it on - which you can do already).

My use case is basically evaluating XPATHs, so no mutation. And yeah, I think RoDocument would do the trick.

dginev commented 12 months ago

It is very easy to cause a race condition - mutating a Node in the parent thread and the same node in the child thread.

Correct. And that's exactly the distinction between Sync and Send

Actually no, I don't think that is the difference. A Send object must be safely transferable to a thread (where no mutability remains possible in the parent thread), a Sync object must have a safe mutability interface that can be used cross-thread. In our libxml2 wrapper's case, neither interface is possible to guarantee if any mutability is allowed at all.

For Document to be safe to Send, the Rust philosophy is to only be able to send it over if no mutability is even theoretically possible on the parent thread.

So yes, RoDocument is the only way this would be possible with the current wrapper design. Until a PR for that lands, you are welcome to grab the root node as an RoNode and then Send that around as needed.

  let root: RoNode = doc.get_root_readonly().unwrap();