Documentation directories structure with multi-fidelity

AwePhD commented 2 months ago

Hi,

I have a question about the choice of the argument previous_pipeline_directory in run_pipeline. I browsed the code and it seems that the optimizer is responsible to get the previous trial, since it's the Optimizer's responsibility to sample trials.

Although, my question is why the argument is not has_previous_fidelity_trial with type bool, or something alike?

I do not know how you manage the workers and the multiprocessing for distributed HPO. So maybe, in some situations, the directory of the previous (fidelity) trial of the same config does not have the same directory of the current one? Or maybe there is a more profound reason that I am not aware.

Note that I am not an HPO practitioner, so my understanding of NePS and PriorBand is fairly limited. I just want to apply HPO on a deep learning model for my research.

The question is more about a sanity check for me, that I understood correctly the documentation about multi-fidelity. The most related piece of documentation that I found is this subsection and the multi fidelity page. Maybe a dedicated didactic page on multi fidelity might be good? The two examples are rich and simple, which is very good. But it might be a bit rough to grasp from a DL perspective, namely not familiar with the multi-fidelity HPO (SH, HP, PB ...). Or maybe it's just/only my personal lack of understanding.

Best, Mathias.

AwePhD commented 2 months ago

I wrongly understood the multi fidelity directories.

When NePS is using multi fidelity, it creates folder like config_{number_config}_{number_fidelity}. So pipeline_directory and previouspipeline_directoryare always different. Therefore,previous_pipeline_directory` is intuitive.

I rename the issue to have a better documentation of the directories structure when performing multi fidelity. I think it might be clearer. In my opinion, it is hard to predict that the directories have this structure beforehand. I think an illustration with some paragraphs might guide the new user in a relevant way? Although, the current documentation is straightforward to see that two run_pipeline calls have different directories. It's just a bit vague to me.

eddiebergman commented 1 month ago

Sorry for the delay in response. Not sure why my notifications for this library are disabled -_- Honestly appreciate the feedback and we'll try to get back to you sooner!

Glad you understood it in the end and yes your interpretation is correct. The main reason to have it in different folders is lost to time but it does make logging of configurations and results much easier to post-process, which is how the library originally was benchmarked. It also helps a bit with paths for file locking (how the parallelism works with arbitrary number of workers), preventing some edge cases.

Thanks for the issue and we'll keep it on the todo-list of things to do. Right now, a lot of the internals are being revamped to make it more performant, usable and lean. One thing that will be revisited is how we handle multi-fidelity. I imagine we'll likely keep the same folder structure and we can document it as so once it's done, including the specifics of the previous pipeline directory.

Some extras:

We'd like to explore many-fidelity soon, such as not just scaling epochsx but also something like depth/width. One benefit of the current pipeline directory approach (as opposed to re-using the directory) is that in a many-fidelity setup, we may ask the user to load a model from an arbitrary checkpoint, and the {config_id}_{fidelity} naming scheme no longer makes sense.

AwePhD commented 1 month ago

Great thanks for the feedback, I was not sure about the relevance of my issues. Keep up with the good work!

automl / neps

Documentation directories structure with multi-fidelity #136