RelationRx / pyrelational

pyrelational is a python active learning library for rapidly implementing active learning pipelines from data management, model development (and Bayesian approximation), to creating novel active learning strategies.
https://pyrelational.readthedocs.io
Apache License 2.0
153 stars 13 forks source link

Adding support for persistent workers and logger fix #198

Open jlotthammer opened 3 weeks ago

jlotthammer commented 3 weeks ago

What is the goal of this PR?

This PR adds enhanced configurability and stability to data loading and logging features within pyrelational's DataManager and LightningModelManager classes. These changes aim to improve both the efficiency of data handling and logging management by adding support for persistent workers in data loading and establishing the ability to pass loggers to the pytorch lightning trainer. This update ensures that data loading can better utilize resources over prolonged tasks and provides support for pytorch lightning loggers.

What are the changes implemented in this PR?

Persistent Workers in DataManager:

The DataManager class now includes an option to keep workers persistent across data loading iterations, controlled by a new loader_persistent_workers parameter. By setting loader_persistent_workers: bool = False by default, backwards compatibility is maintained, but it can be enabled to reduce initialization overhead in multi-epoch training, benefiting users handling larger datasets or needing faster data loader setup times.

In the LightningModelManager class, default logging has been configured using PyTorch Lightning's CSVLogger. If no logger is specified in trainer_config, a CSVLogger is automatically set up to log to config["checkpoints_dir"] consistent with pytorch lightning.

Lastly, the abstract model manager now uses a context manager to load JSON data.

thomasgaudelet commented 2 weeks ago

Thanks for this! Adding @paulmorio for review! We'll review asap 🙂