VERITAS-Observatory / gammapy-tools

Repository with tools for gammapy analysis
GNU General Public License v3.0
2 stars 0 forks source link

Mimic data reducing computational requirements #19

Closed steob92 closed 3 months ago

steob92 commented 4 months ago

As per @matthew-w-lundy suggestion, we can reduce the computational requirement by nesting the search criteria when finding mimicked datasets.

This will be implemented into the 1.0_rc branch.

General idea: We are performing multiple operations on a large astropy Table to mask out runs not matching the required parameters. By nesting the searches we can reduce the time required to run. This would mean each operation is performed on a smaller table.

Implementation: We no longer need an immutable Table kept, instead we can just grab a final list of observations. A "sub" table can be created and overwritten at each step. The final list is obtained from the final sub table.

steob92 commented 4 months ago

This was ran on Romulus through an apptainer image using jupyter notebook.

Configuration used from benchmark test (note skipping KL selection for now):

# Other option
config['io']['from_run'] = False
config['background_selection']['smooth'] = True
config['background_selection']['smooth_sigma'] = 1.0
config['background_selection']['KL_DIV'] = False
config['config']['njobs'] = 10

Current method:

Prepare Dataset Took: 0.963s
Run Make Background took: 70.200s/run

Nested search:

Prepare Dataset Took: 0.902 s
Run Make Background took: 53.884 s/run

The "Prepare Dataset" is unchanged so let's assume a <1s noise on this quick test.

steob92 commented 4 months ago

Checking with

config['background_selection']['KL_DIV'] = True

Current method:

Run Make Background took: 372.110s/run

Nested Method:

Run Make Background took: 370.366s/run

No significant change... Todo: Look at profiling to see where this can be improved.

steob92 commented 4 months ago

Found a bug is None vs is not None bug that meant the entire datastore was passed to the KL_DIV search. Also modified BackgroundModelEstimator.run to perform the summation in parallel.

Running 10 jobs in parallel results in a cannot allocate memory error on Romulus (128 GB of ram...). Need to look into this a little more. But the runtime decrease by a factor of 10 for the KL div method when only running 4 cores:

Run Make Background took: 36.283s/run

Which gives a speed up compared to the baseline of 1.9.

Without using the KL_DIV method:

Run Make Background took: 16.271s/run

Which gives a speed up compared to the baseline of 4.3.

steob92 commented 3 months ago

Current changes appear to be sufficient. Addition of saving KL_DIV when first calculated also reduces the computational overhead. #20 #23 #25 all help.