Multi threaded interface

alexomics commented 1 year ago

We can extend the base mappy-rs.Aligner, which is currently single-threaded, to use multiple threads on iterables of data.

Proposed minimal interface:

[ ] mappy-rs.Aligner.send(data: PyDict): send a python dictionary containing at least one key/value pair, seq -> FASTA (string) (?). This function should return whether the data was queued successfully. Maybe take a second parameter that is the key to the FASTA data
[ ] mappy-rs.Aligner.get_results: retrieve all available aligned data from the output queue and return it.

Extended interface:

[ ] mappy-rs.Aligner.send_batch(batch: Iter[data, ...]): Place an iterable of data dictionaries into the work queue. These would be retrieved by get_results. This should be non-blocking.
[x] mappy-rs.Aligner.map_batch(batch: Iter[data, ...]): Map a batch of data, yielding results

alexomics commented 1 year ago

I suppose there is also a choice on whether this is an impl on Aligner or we do ThreadedAligner

Adoni5 commented 1 year ago

Yes I like this

Adoni5 commented 1 year ago

I can't decide what the right approach is RE. Threaded Aligner If we do a Threaded Aligner, what reason is there to implement Mappy Compatibility, as we can just ignore it in favour of Threaded Aligner.

alexomics commented 1 year ago

Hypothetically:

pub struct ThreadedAligner {
    pub aligner: Aligner,
    work_queue: Arc<Mutex<VecDeque<PyDict>>>,
    result_queue: Arc<Mutex<VecDeque<PyDict>>>,
}

Where Aligner is the mappy-rs::Aligner.

Reasons to keep mappy compatibility are for drop-in single threaded apps. Not sure if we want to add the queue baggage to the base case? Though it would make for a nicer one-stop-shop

Adoni5 commented 1 year ago

So the One-Stop-Shop thing is what I'm thinking of. If you are only doing single threaded cases, you should just be using regular mappy, so we shouldn't worry about the queue baggage on Aligner?

All we should worry about is returning Mappy compatible mappings, so downstream processing isn't affected

Adoni5 commented 1 year ago

We could also try with rayon if we wanted to?

Adoni5 / mappy-rs

Multi threaded interface #13