Open alexomics opened 1 year ago
I suppose there is also a choice on whether this is an impl on Aligner
or we do ThreadedAligner
Yes I like this
I can't decide what the right approach is RE. Threaded Aligner If we do a Threaded Aligner, what reason is there to implement Mappy Compatibility, as we can just ignore it in favour of Threaded Aligner.
Hypothetically:
pub struct ThreadedAligner {
pub aligner: Aligner,
work_queue: Arc<Mutex<VecDeque<PyDict>>>,
result_queue: Arc<Mutex<VecDeque<PyDict>>>,
}
Where Aligner
is the mappy-rs::Aligner
.
Reasons to keep mappy compatibility are for drop-in single threaded apps. Not sure if we want to add the queue baggage to the base case? Though it would make for a nicer one-stop-shop
So the One-Stop-Shop thing is what I'm thinking of. If you are only doing single threaded cases, you should just be using regular mappy, so we shouldn't worry about the queue baggage on Aligner?
All we should worry about is returning Mappy compatible mappings, so downstream processing isn't affected
We could also try with rayon
if we wanted to?
We can extend the base
mappy-rs.Aligner
, which is currently single-threaded, to use multiple threads on iterables of data.Proposed minimal interface:
mappy-rs.Aligner.send(data: PyDict)
: send a python dictionary containing at least one key/value pair,seq
-> FASTA (string) (?). This function should return whether the data was queued successfully. Maybe take a second parameter that is the key to the FASTA datamappy-rs.Aligner.get_results
: retrieve all available aligned data from the output queue and return it.Extended interface:
mappy-rs.Aligner.send_batch(batch: Iter[data, ...])
: Place an iterable ofdata
dictionaries into the work queue. These would be retrieved byget_results
. This should be non-blocking.mappy-rs.Aligner.map_batch(batch: Iter[data, ...])
: Map a batch of data, yielding results