Open gpelouze opened 9 months ago
We can use code from Berend: https://gist.github.com/gpelouze/7dcc95a4beb78276f324ff1ab3591a46
# New splitter with semaphore
def rewrite_list_nested(in_list,semaphore):
"""
B. Wijers
23-02-2024
b.c.wijers (at) uva.nl
for Lifewatch - NaaVRE
Splitter
input:
list with elements
TYPE: list
output:
list with semaphore number of nested elements
TYPE: List[list]
Notes:
Will attempt to make an even split.
In case no even splits can be made, the last
set of elements which could not be split
are evenly distributed across the nested lists.
Even distribution means that each nested list
receives up to a maximum of one extra element.
In these cases the nested lists have varying
amount of elements but will never exceed the
difference by more than 1 element.
"""
out_list = []
# Get the length of the list
in_list_len = len(in_list)
# Expecting integers so we can use floor division
# Determine the maximum even-splits we can do
worker_chunk_size = in_list_len // semaphore
# Determine how many elements could not be split
leftovers = in_list_len%semaphore
while in_list:
# Set leftover to 0
leftover = 0
# If we could not make even-split
if leftovers:
# Can not exceed one element addition per split
leftover = 1
# Update the number of leftovers
leftovers -= leftover
# Add the nested list. If there was a leftover, add it
out_list.append(in_list[:worker_chunk_size+leftover])
# Update the input list and remove the elements we added to
# output list
in_list = in_list[worker_chunk_size+leftover:]
return out_list
We want to allow users to configure either the batch size or total number of splits generated by the splitter.