bovee / entab

* -> TSV
MIT License
21 stars 5 forks source link

Better support for multi-threading #16

Closed bovee closed 2 years ago

bovee commented 2 years ago

I tried writing the following documentation on how to pass records off to e.g. a thread poll, but it doesn't work.

//! If you want to pass records off to a thread pool:
//! ```
//! # #[cfg(feature = "std")] {
//! use std::fs::File;
//! use rayon;
//! use entab::parsers::extract_opt;
//! use entab::readers::init_state;
//! use entab::readers::fastq::FastqRecord;
//!
//! let f = File::open("./tests/data/test.fastq")?;
//! let (mut rb, mut state) = init_state(f, None)?;
//! while let Some(slice) = rb.refill()? {
//!     let consumed = &mut 0;
//!     let eof = rb.eof;
//!     rayon::scope(|s| {
//!         while let Some(FastqRecord { id, ..}) = extract_opt(slice, eof, consumed, &mut state).unwrap() {
//!             s.spawn(move |_| {
//!                 println!("{}", id);
//!             });
//!         }
//!         Ok::<(), &str>(())
//!     });
//!     rb.consumed += *consumed;
//! }
//! # }
//! # use entab::EtError;
//! # Ok::<(), EtError>(())
//! ```

It threw the following errors:

error[E0499]: cannot borrow `state` as mutable more than once at a time
  --> src/lib.rs:51:83
   |
16 |       rayon::scope(|s| {
   |                     - has type `&Scope<'1>`
17 |           while let Some(FastqRecord { id, ..}) = extract_opt(slice, eof, consumed, &mut state).unwrap() {
   |                                                                                     ^^^^^^^^^^ `state` was mutably borrowed here in the previous iteration of the loop
18 | /             s.spawn(move |_| {
19 | |                 println!("{}", id);
20 | |             });
   | |______________- argument requires that `state` is borrowed for `'1`

error[E0503]: cannot use `rb.eof` because it was mutably borrowed
  --> src/lib.rs:49:15
   |
13 | while let Some(slice) = rb.refill()? {
   |                         ----------- borrow of `rb` occurs here
14 |     let consumed = &mut 0;
15 |     let eof = rb.eof;
   |               ^^^^^^ use of borrowed `rb`
16 |     rayon::scope(|s| {
17 |         while let Some(FastqRecord { id, ..}) = extract_opt(slice, eof, consumed, &mut state).unwrap() {
   |                                                             ----- borrow later captured here by closure

The ergonomics here are still pretty bad (e.g. it would be nice to write rb.next_no_refill?) but that might be because I haven't written any crossbeam/rayon/etc code lately.

bovee commented 2 years ago

Redid this quite a bit in 793c7266b9eae0505af7d5380967aed93a99dc12 to use a BufferChunk object that gets passed in and out and I think it's a bit better. You still need to wrap state in a Mutex and return the BufferChunk from the rayon scope, but it seems like it works (although I've only tested it on toy examples where the multithreading overhead totally kills any perf gains).

bovee commented 2 years ago

As a note, I'm considering the current API as a beta and I marked all the documentation for it as hidden until it gets trialed a little more (preferably by someone with an actual use case!)