Open krvoigt opened 2 years ago
Maybe we could describe more what problem we are trying to solve and what users can expect after the implementation.
Setup functionality (like loading models or other data into memory) is intertwined with processing, making it difficult to separate the two.. E.g. why is it useful to make this separation?
PS: I think the purpose behind this feature would normally serve as epic description (Like "ruduce processing time by X to meet metric Y") and one of the actual user stories from that epic would be "as processor dev I want to process pages in parallel"
Setup functionality (like loading models or other data into memory) is intertwined with processing, making it difficult to separate the two.. E.g. why is it useful to make this separation?
It improves performance because setting up the processor can be done just once instead of with every call to process
.
Current situation
Processors iterate over the files in a workspace on their own. While it is possible to restrict the processing to a single page or a list/range of pages, the API is targeted towards processors deriving the pages to process on their own. Setup functionality (like loading models or other data into memory) is intertwined with processing, making it difficult to separate the two (i.e. if doing pagewise processing with
pageID
restriction, the setup inprocess
still happens for every call.How it should be
The
process
method should be deprecated and replaced with aprocess_page
method.Processors should have a
setup
method that encapsulates all the post-initialization but pre-processing steps necessary for processing.Steps
process_page
andsetup
process