Jammy2211 / PyAutoGalaxy

PyAutoGalaxy: Open-Source Multiwavelength Galaxy Structure & Morphology
https://pyautogalaxy.readthedocs.io/
MIT License
27 stars 13 forks source link

Error when running search.fit #130

Closed Conor-Larison closed 6 months ago

Conor-Larison commented 10 months ago

Hello, I am getting this error when running search.fit() in the introduction python notebook in the autogalaxy workspace. It looks like the error is in autofit. For context, this was run on a fresh conda environment using python 3.10.

2023-10-18 16:06:24,768 - autogalaxy.analysis.analysis - INFO - PRELOADS - Setting up preloads, may take a few minutes for fits using an inversion. 2023-10-18 16:06:25,454 - introduction - INFO - The output path of this fit is /Users/conor/autogalaxy_workspace/output/introduction/8b2bc1782c7e4c41686b415f10f06a12 2023-10-18 16:06:25,454 - introduction - INFO - Outputting pre-fit files (e.g. model.info, visualization). 2023-10-18 16:06:25,956 - introduction - INFO - Starting new Nautilus non-linear search (no previous samples found). 2023-10-18 16:06:25,957 - introduction - INFO - number of cores == 1 2023-10-18 16:06:25,957 - introduction - INFO - Creating multiprocessing Pool of size 1... 2023-10-18 16:06:25,958 - autofit.non_linear.parallel.sneaky - INFO - ... using multiprocessing #########################

Exploration Phase

#########################

Adding Bound 1: done Ellipsoids: 0 Neural Networks: 0 Filling Bound 1: 0%| | 0/400 [00:00<?, ?it/s]

RemoteTraceback Traceback (most recent call last) RemoteTraceback: """ Traceback (most recent call last): File "/Users/conor/opt/anaconda3/envs/autogalaxy/lib/python3.10/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "/Users/conor/opt/anaconda3/envs/autogalaxy/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "/Users/conor/opt/anaconda3/envs/autogalaxy/lib/python3.10/site-packages/autofit/non_linear/parallel/sneaky.py", line 384, in fitness_cache return FunctionCache.fitness(x, *FunctionCache.fitness_args, AttributeError: type object 'FunctionCache' has no attribute 'fitness' """

The above exception was the direct cause of the following exception:

AttributeError Traceback (most recent call last) Cell In[10], line 1 ----> 1 result = search.fit(model=model, analysis=analysis)

File ~/opt/anaconda3/envs/autogalaxy/lib/python3.10/site-packages/autofit/non_linear/search/abstract_search.py:527, in NonLinearSearch.fit(self, model, analysis, info, bypass_nuclear_if_on) 520 self.pre_fit_output( 521 analysis=analysis, 522 model=model, 523 info=info, 524 ) 526 if not self.paths.is_complete: --> 527 result = self.start_resume_fit( 528 analysis=analysis, 529 model=model, 530 ) 531 else: 532 result = self.result_via_completed_fit( 533 analysis=analysis, 534 model=model, 535 )

File ~/opt/anaconda3/envs/autogalaxy/lib/python3.10/site-packages/autofit/non_linear/search/abstract_search.py:643, in NonLinearSearch.start_resume_fit(self, analysis, model) 640 self.timer.start() 642 model.freeze() --> 643 self._fit( 644 model=model, 645 analysis=analysis, 646 ) 647 samples = self.perform_update( 648 model=model, analysis=analysis, during_analysis=False 649 ) 651 result = analysis.make_result( 652 samples=samples, 653 )

File ~/opt/anaconda3/envs/autogalaxy/lib/python3.10/site-packages/autofit/non_linear/search/nest/nautilus/search.py:137, in Nautilus._fit(self, model, analysis) 135 else: 136 if not self.using_mpi: --> 137 self.fit_multiprocessing(fitness=fitness, model=model, analysis=analysis) 138 else: 139 self.fit_mpi(fitness=fitness, model=model, analysis=analysis)

File ~/opt/anaconda3/envs/autogalaxy/lib/python3.10/site-packages/autofit/non_linear/search/nest/nautilus/search.py:244, in Nautilus.fit_multiprocessing(self, fitness, model, analysis) 241 self.output_sampler_results(sampler=sampler) 242 self.perform_update(model=model, analysis=analysis, during_analysis=True) --> 244 sampler.run( 245 **self.config_dict_run, 246 ) 248 self.output_sampler_results(sampler=sampler)

File ~/opt/anaconda3/envs/autogalaxy/lib/python3.10/site-packages/nautilus/sampler.py:418, in Sampler.run(self, f_live, n_shell, n_eff, discard_exploration, verbose) 415 while (self.live_evidence_fraction() > f_live or 416 len(self.bounds) == 0): 417 self.add_bound(verbose=verbose) --> 418 self.fill_bound(verbose=verbose) 419 if self.filepath is not None: 420 self.write(self.filepath, overwrite=True)

File ~/opt/anaconda3/envs/autogalaxy/lib/python3.10/site-packages/nautilus/sampler.py:961, in Sampler.fill_bound(self, verbose) 959 points, n_bound, idx_t = self.sample_shell(-1, shell_t) 960 assert len(points) + len(idx_t) == n_bound --> 961 log_l, blobs = self.evaluate_likelihood(points) 962 self.points[-1].append(points) 963 self.log_l[-1].append(log_l)

File ~/opt/anaconda3/envs/autogalaxy/lib/python3.10/site-packages/nautilus/sampler.py:760, in Sampler.evaluate_likelihood(self, points) 758 result = list(zip(*result)) 759 elif self.pool_l is not None: --> 760 result = list(self.pool_l.map(self.likelihood, args)) 761 else: 762 result = list(map(self.likelihood, args))

File ~/opt/anaconda3/envs/autogalaxy/lib/python3.10/site-packages/autofit/non_linear/parallel/sneaky.py:499, in SneakierPool.map(self, function, iterable) 483 def map( 484 self, function: Callable, 485 iterable: Iterable 486 ): 487 """ 488 Map a function over an iterable using the map method 489 of the initialized pool. (...) 497 498 """ --> 499 return self.pool.map(function, iterable)

File ~/opt/anaconda3/envs/autogalaxy/lib/python3.10/multiprocessing/pool.py:367, in Pool.map(self, func, iterable, chunksize) 362 def map(self, func, iterable, chunksize=None): 363 ''' 364 Apply func to each element in iterable, collecting the results 365 in a list that is returned. 366 ''' --> 367 return self._map_async(func, iterable, mapstar, chunksize).get()

File ~/opt/anaconda3/envs/autogalaxy/lib/python3.10/multiprocessing/pool.py:774, in ApplyResult.get(self, timeout) 772 return self._value 773 else: --> 774 raise self._value

AttributeError: type object 'FunctionCache' has no attribute 'fitness'

Jammy2211 commented 10 months ago

pip install threadpoolctl==3.1.0

Common issue I'll do a release to fix this requirement issue properly! But that pip command will sort it.

Conor-Larison commented 10 months ago

hm, I am still recovering the same error after this pip line (restarted kernel, used pip in notebook itself, etc.).

Jammy2211 commented 10 months ago

OK yeah wrong error message lol.

The issue is that parallelization isn't supported in your Jupyter notebook. Are you on Windows?

Jammy2211 commented 10 months ago

For now, disabling parallel runs and using dybesty will fix it. So replace the nautilus code with this:

search = af.DynestyStatic( path_prefix=path.join("searches"), name="DynestyStatic", nlive=50, sample="rwalk", force_x1_cpu=True, number_of_cores=1, )

Obviously replace things like name and path_prefix with what you're using.

I'll send full instructions tomorrow (it's late here in the UK) on how to fix the parallelization so you can get the speed up.

Conor-Larison commented 10 months ago

On macOS. Perhaps GoogleColab is the way to go for this.

I will try this fix and post an update tn, really appreciate all the support. Cheers!

Jammy2211 commented 10 months ago

OK I think I know how to fix it but will post in the morning once I'm on my laptop!

Conor-Larison commented 10 months ago

Took about 23 minutes on my local machine but the above solution worked on the introduction material! Thank you so much again for the help, will check back in tomorrow morning.

Jammy2211 commented 10 months ago

Ok, it'll require a few back-and-forth experiments as its do with understanding how new MacOS parallelizes things so let me know if you've got an hour free.

Basically, Jupyter notebook + parallelization often = crash.

First, can you run the notebook with number_of_cores=1 so I can understand if the error occurs even with 1 core (it still calls Python multiprocessing) when this happens:

search = af.Nautilus(
    path_prefix=path.join("imaging", "modeling"),
    name="start_here",
    unique_tag=dataset_name,
    n_live=150,
    number_of_cores=1,
    iterations_per_update=10000,
)

Next, can you run the Python script version to see if that fixes it:

https://github.com/Jammy2211/autogalaxy_workspace/blob/release/scripts/imaging/modeling/start_here.py

On the command line as python start_here.py

Finally, this script will hopefully fix it if the others don't:

https://github.com/Jammy2211/autogalaxy_workspace/blob/main/scripts/imaging/modeling/customize/parallel_bug_fix.py

If its still broken let me know and we can try some other things.

I'm not sure whether you can get multiprocessing to run in Jupyter Notebook cells, I will ask around.

Jammy2211 commented 10 months ago

Took about 23 minutes on my local machine

Happy to offer some support on run times, the tutorials are currently set up for things like nested sampling which are slow but fits complex models very robustly (and provide things like the evidence). So the run times are gonna be a lot longer than something like GALFIT, but a lot more robust.

Nautilus should give you a ~x3 speed up on dynesty, and with parallelization you should get another x3... so hopefully you can break the < 3 minute barrier lol.

Conor-Larison commented 10 months ago

Hey James, unfortunately this is still not working. The start_here.py was not working for reasons I believe the bug fixing script was meant to solve. The bug parallel_bug_fix.py script did seem to fix the error in the other script, but now I am getting the same error that I was getting in the Jupyter notebook. Should I email you to get added to the Slack?

2023-10-19 08:48:48,239 - autogalaxy.analysis.analysis - INFO - PRELOADS - Setting up preloads, may take a few minutes for fits using an inversion. 2023-10-19 08:48:48,251 - light[bulge_disk] - INFO - The output path of this fit is /Users/conor/autogalaxy_workspace/scripts/imaging/modeling/customize/output/imaging/modeling/simple/light[bulge_disk]/f256ec0321c48a10972066d9e08975fd 2023-10-19 08:48:48,252 - light[bulge_disk] - INFO - Outputting pre-fit files (e.g. model.info, visualization). 2023-10-19 08:48:48,645 - light[bulge_disk] - INFO - Starting new Nautilus non-linear search (no previous samples found). 2023-10-19 08:48:48,645 - light[bulge_disk] - INFO - number of cores == 4 2023-10-19 08:48:48,645 - light[bulge_disk] - INFO - Creating SneakierPool... 2023-10-19 08:48:48,645 - autofit.non_linear.parallel.sneaky - INFO - ... using multiprocessing #########################

Exploration Phase

#########################

. . .

AttributeError: type object 'FunctionCache' has no attribute 'fitness'

Jammy2211 commented 10 months ago

Yeah lets go to SLACK.

Jammy2211 commented 10 months ago

Ok, pretty sure it was a bug in my source code implementation of Nautilus in autofit, dohhhhh.

Will release a new autogalaxy with a fix soon.

Conor-Larison commented 10 months ago

Ok, I'll shoot you an email to join the slack anyway