We can't do multithreading for databases that support first/last game lookup without major structural changes
Defer this to a separate database format type. To be designed (gonna be fun :D)
We actually can. But it requires a different approach
Divide files to different threads.
if order of games matters this has to be done apriori, otherwise we couldn't
keep the order of files the same as the order of games (or we would need one file per pgn which is potentially too fragmented)
if order of games DOESN'T matter (for example formats without first/last game or formats that can order games not only by their location) then we can and should do a task queue (but we have to order files from largest to smallest!, otherwise we risk long single threaded task at the end)
Each thread creates files 1000000*thread_id + i
Store all entries from a single game temporarily
After the whole game is processed add it to the game headers
Set game index in all entries
Copy the entries to the output buffer
Sorting has to be stable_sort if order of games matters - and the only way to order the games by time is by the entry location.
and we have to go back to combining entries where first/last games are not min max but from location
Some old points that may apply:
We CAN easly do multithreading for databases that don't store header data
fit into current database format code
constexpr bool gameOrderMatters = hasAnyHeader
this also affects mergemode
constexpr bool parallelizable = !gameOrderMatters
if parallelizable && config.max_num_import_threads > normal_num_import_threads (we have a pipeline so there is more than 1 thread by default): parallelImport()
here we completely disregard game header code
we have a singe producer multiple consumer concurrent queue of input files
list of database files has to be atomic again (inherit from Lockable? if constexpr then lock())
the number of buffers in flight will grow by a factor of parallelization, that's fine
The general directions should be readding parallelisation to the current database class, don't separate sequential from parallel.
General thoughts follow:
We can't do multithreading for databases that support first/last game lookup without major structural changes
Defer this to a separate database format type. To be designed (gonna be fun :D)
We actually can. But it requires a different approach
Some old points that may apply:
The general directions should be readding parallelisation to the current database class, don't separate sequential from parallel.