Open votti opened 5 years ago
I just modified the code and tested against example_csv
and example_grouping
. It should be ok now for both of them. You can find the results under /mnt/output/20190306_csv_par
and /mnt/output/20190306_grouping_par
. I will commit and wait for @smaffiol to push
Problem:
Pipelines can have grouping, that requires images to be processed together. This will be reflected in the
group.json
file that can be printed fromcellprofiler
via the--print-groups
command. There eachImageGroup
is an entry in the list. EachImageGroup
has in turn a list of the associatedImageSets
: example with 2 groups: 1: Image 1, 2: Image 2&3: ->(This example is equivalent to the
example_grouping
example from the test-data).Currently, the code only checks for the number of entries in this Json and assumes that the number of groups = number of images (https://github.com/BodenmillerGroup/gc3apps/blob/2c92a8d36cef49e389692d5182c8ca066f1cbaf4/gc3apps/gcp_pipeline.py#L117)
Thus for this example, the
gcp_pipeline
would now assume the pipeline would only contain 2 images and process images 1-2. When parallelizing the job is split up into processing images: 1-1 and 2-2.The correct behaviour would be: Without parallelization: process images 1-3 With parallelization: make maximally 2 batches, process images 1-1 and 2-3.