eth-easl / modyn

Modyn is a research-platform for training ML models on growing datasets.
MIT License
22 stars 3 forks source link

feat: Add eval handler abstraction #479

Closed robinholzi closed 1 month ago

robinholzi commented 1 month ago

Motivation

We want to have evaluation handlers and be able to specify multiple handlers per pipeline. Also, we refine the config models for more evaluation strategies and evaluation triggers that will be implemented in a followup PR.

github-actions[bot] commented 1 month ago

:white_check_mark: Result of Pytest Coverage

---------- coverage: platform linux, python 3.12.3-final-0 ----------- Name Stmts Miss Cover
modyn/common/benchmark/stopwatch.py 26 0 100%
modyn/common/example_extension/example_extension.py 28 2 93%
modyn/common/ftp/ftp_server.py 31 18 42%
modyn/common/ftp/ftp_utils.py 83 69 17%
modyn/common/grpc/grpc_helpers.py 177 54 69%
modyn/common/trigger_sample/trigger_sample_storage.py 158 9 94%
modyn/config/schema/base_model.py 5 0 100%
modyn/config/schema/pipeline/config.py 24 0 100%
modyn/config/schema/pipeline/data.py 27 7 74%
modyn/config/schema/pipeline/evaluation/config.py 28 0 100%
modyn/config/schema/pipeline/evaluation/handler.py 14 0 100%
modyn/config/schema/pipeline/evaluation/metric.py 33 8 76%
modyn/config/schema/pipeline/evaluation/strategy/offset.py 17 0 100%
modyn/config/schema/pipeline/evaluation/strategy/slicing.py 22 0 100%
modyn/config/schema/pipeline/model.py 5 0 100%
modyn/config/schema/pipeline/model_storage.py 16 0 100%
modyn/config/schema/pipeline/sampling/config.py 36 1 97%
modyn/config/schema/pipeline/sampling/downsampling_config.py 70 1 99%
modyn/config/schema/pipeline/training/config.py 76 8 89%
modyn/config/schema/pipeline/trigger.py 24 0 100%
modyn/config/schema/system/config.py 95 0 100%
modyn/database/abstract_database_connection.py 35 0 100%
modyn/database/partition_by_meta.py 33 12 64%
modyn/evaluator/evaluator.py 15 0 100%
modyn/evaluator/evaluator_entrypoint.py 32 3 91%
modyn/evaluator/internal/dataset/evaluation_dataset.py 75 3 96%
modyn/evaluator/internal/grpc/evaluator_grpc_server.py 22 0 100%
modyn/evaluator/internal/grpc/evaluator_grpc_servicer.py 184 14 92%
modyn/evaluator/internal/metric_factory.py 18 1 94%
modyn/evaluator/internal/metrics/abstract_decomposable_metric.py 10 1 90%
modyn/evaluator/internal/metrics/abstract_evaluation_metric.py 29 2 93%
modyn/evaluator/internal/metrics/abstract_holistic_metric.py 10 1 90%
modyn/evaluator/internal/metrics/accuracy.py 30 2 93%
modyn/evaluator/internal/metrics/f1_score.py 62 0 100%
modyn/evaluator/internal/metrics/roc_auc.py 35 1 97%
modyn/evaluator/internal/pytorch_evaluator.py 113 28 75%
modyn/evaluator/internal/utils/evaluation_info.py 8 0 100%
modyn/evaluator/internal/utils/evaluation_process_info.py 8 0 100%
modyn/evaluator/internal/utils/evaluator_messages.py 3 0 100%
modyn/metadata_database/metadata_base.py 3 0 100%
modyn/metadata_database/metadata_database_connection.py 55 3 95%
modyn/metadata_database/models/pipelines.py 24 1 96%
modyn/metadata_database/models/sample_training_metadata.py 15 0 100%
modyn/metadata_database/models/selector_state_metadata.py 47 10 79%
modyn/metadata_database/models/trained_models.py 18 0 100%
modyn/metadata_database/models/trigger_partitions.py 10 0 100%
modyn/metadata_database/models/trigger_training_metadata.py 14 0 100%
modyn/metadata_database/models/triggers.py 10 0 100%
modyn/metadata_database/utils/model_storage_strategy_config.py 21 2 90%
modyn/metadata_processor/internal/grpc/metadata_processor_grpc_servicer.py 18 0 100%
modyn/metadata_processor/internal/grpc/metadata_processor_server.py 24 0 100%
modyn/metadata_processor/internal/metadata_processor_manager.py 23 4 83%
modyn/metadata_processor/metadata_processor.py 11 0 100%
modyn/metadata_processor/metadata_processor_entrypoint.py 24 1 96%
modyn/metadata_processor/processor_strategies/abstract_processor_strategy.py 30 0 100%
modyn/metadata_processor/processor_strategies/basic_processor_strategy.py 17 2 88%
modyn/metadata_processor/processor_strategies/processor_strategy_type.py 6 1 83%
modyn/model_storage/internal/grpc/grpc_server.py 23 0 100%
modyn/model_storage/internal/grpc/model_storage_grpc_servicer.py 54 0 100%
modyn/model_storage/internal/model_storage_manager.py 118 5 96%
modyn/model_storage/internal/storage_strategies/abstract_difference_operator.py 11 2 82%
modyn/model_storage/internal/storage_strategies/abstract_model_storage_strategy.py 16 1 94%
modyn/model_storage/internal/storage_strategies/difference_operators/sub_difference_operator.py 12 0 100%
modyn/model_storage/internal/storage_strategies/difference_operators/xor_difference_operator.py 14 0 100%
modyn/model_storage/internal/storage_strategies/full_model_strategies/abstract_full_model_strategy.py 26 2 92%
modyn/model_storage/internal/storage_strategies/full_model_strategies/binary_full_model.py 16 0 100%
modyn/model_storage/internal/storage_strategies/full_model_strategies/pytorch_full_model.py 15 0 100%
modyn/model_storage/internal/storage_strategies/incremental_model_strategies/abstract_incremental_model_strategy.py 26 10 62%
modyn/model_storage/internal/storage_strategies/incremental_model_strategies/weights_difference.py 99 1 99%
modyn/model_storage/internal/utils/model_storage_policy.py 35 0 100%
modyn/model_storage/model_storage.py 27 3 89%
modyn/model_storage/model_storage_entrypoint.py 32 3 91%
modyn/models/articlenet/articlenet.py 30 16 47%
modyn/models/coreset_methods_support.py 29 1 97%
modyn/models/dlrm/cuda_ext/dot_based_interact.py 24 13 46%
modyn/models/dlrm/cuda_ext/fused_gather_embedding.py 16 16 0%
modyn/models/dlrm/cuda_ext/sparse_embedding.py 32 32 0%
modyn/models/dlrm/dlrm.py 67 9 87%
modyn/models/dlrm/nn/embeddings.py 123 64 48%
modyn/models/dlrm/nn/factories.py 24 9 62%
modyn/models/dlrm/nn/interactions.py 50 11 78%
modyn/models/dlrm/nn/mlps.py 77 23 70%
modyn/models/dlrm/nn/parts.py 60 4 93%
modyn/models/dlrm/setup.py 5 5 0%
modyn/models/dlrm/utils/install_lib.py 11 7 36%
modyn/models/dlrm/utils/utils.py 28 0 100%
modyn/models/dummy/dummy.py 12 0 100%
modyn/models/fmownet/fmownet.py 25 0 100%
modyn/models/resnet18/resnet18.py 39 7 82%
modyn/models/resnet50/resnet50.py 39 7 82%
modyn/models/resnet152/resnet152.py 39 7 82%
modyn/models/tokenizers/distill_bert_tokenizer.py 11 0 100%
modyn/models/yearbooknet/yearbooknet.py 23 0 100%
modyn/selector/internal/grpc/selector_grpc_servicer.py 78 22 72%
modyn/selector/internal/grpc/selector_server.py 33 12 64%
modyn/selector/internal/selector_manager.py 125 37 70%
modyn/selector/internal/selector_strategies/abstract_selection_strategy.py 125 8 94%
modyn/selector/internal/selector_strategies/coreset_strategy.py 66 6 91%
modyn/selector/internal/selector_strategies/downsampling_strategies/abstract_downsampling_strategy.py 29 0 100%
modyn/selector/internal/selector_strategies/downsampling_strategies/craig_downsampling_strategy.py 20 12 40%
modyn/selector/internal/selector_strategies/downsampling_strategies/downsampling_scheduler.py 51 0 100%
modyn/selector/internal/selector_strategies/downsampling_strategies/gradmatch_downsampling_strategy.py 16 8 50%
modyn/selector/internal/selector_strategies/downsampling_strategies/gradnorm_downsampling_strategy.py 6 0 100%
modyn/selector/internal/selector_strategies/downsampling_strategies/kcentergreedy_downsampling_strategy.py 16 8 50%
modyn/selector/internal/selector_strategies/downsampling_strategies/loss_downsampling_strategy.py 6 0 100%
modyn/selector/internal/selector_strategies/downsampling_strategies/no_downsampling_strategy.py 12 0 100%
modyn/selector/internal/selector_strategies/downsampling_strategies/rho_loss_downsampling_strategy.py 70 1 99%
modyn/selector/internal/selector_strategies/downsampling_strategies/rs2_downsampling_strategy.py 12 0 100%
modyn/selector/internal/selector_strategies/downsampling_strategies/submodular_downsampling_strategy.py 22 14 36%
modyn/selector/internal/selector_strategies/downsampling_strategies/uncertainty_downsampling_strategy.py 17 9 47%
modyn/selector/internal/selector_strategies/downsampling_strategies/utils.py 7 0 100%
modyn/selector/internal/selector_strategies/freshness_sampling_strategy.py 130 12 91%
modyn/selector/internal/selector_strategies/new_data_strategy.py 98 10 90%
modyn/selector/internal/selector_strategies/presampling_strategies/abstract_balanced_strategy.py 57 0 100%
modyn/selector/internal/selector_strategies/presampling_strategies/abstract_presampling_strategy.py 23 1 96%
modyn/selector/internal/selector_strategies/presampling_strategies/label_balanced_presampling_strategy.py 7 0 100%
modyn/selector/internal/selector_strategies/presampling_strategies/no_presampling_strategy.py 16 1 94%
modyn/selector/internal/selector_strategies/presampling_strategies/random_no_replacement_presampling_strategy.py 42 0 100%
modyn/selector/internal/selector_strategies/presampling_strategies/random_presampling_strategy.py 17 0 100%
modyn/selector/internal/selector_strategies/presampling_strategies/trigger_balanced_presampling_strategy.py 13 1 92%
modyn/selector/internal/selector_strategies/presampling_strategies/utils.py 9 0 100%
modyn/selector/internal/selector_strategies/utils.py 10 0 100%
modyn/selector/internal/storage_backend/abstract_storage_backend.py 34 7 79%
modyn/selector/internal/storage_backend/database/database_storage_backend.py 85 7 92%
modyn/selector/internal/storage_backend/local/local_storage_backend.py 136 5 96%
modyn/selector/selector.py 82 14 83%
modyn/selector/selector_entrypoint.py 31 3 90%
modyn/supervisor/entrypoint.py 31 3 90%
modyn/supervisor/internal/eval_strategies/abstract_eval_strategy.py 8 1 88%
modyn/supervisor/internal/eval_strategies/offset_eval_strategy.py 22 0 100%
modyn/supervisor/internal/eval_strategies/slicing_eval_strategy.py 17 0 100%
modyn/supervisor/internal/evaluation_result_writer/abstract_evaluation_result_writer.py 16 2 88%
modyn/supervisor/internal/evaluation_result_writer/json_result_writer.py 23 1 96%
modyn/supervisor/internal/evaluation_result_writer/tensorboard_result_writer.py 13 0 100%
modyn/supervisor/internal/grpc/enums.py 55 0 100%
modyn/supervisor/internal/grpc/supervisor_grpc_server.py 25 7 72%
modyn/supervisor/internal/grpc/supervisor_grpc_servicer.py 35 0 100%
modyn/supervisor/internal/grpc/template_msg.py 26 0 100%
modyn/supervisor/internal/grpc_handler.py 217 29 87%
modyn/supervisor/internal/pipeline_executor/models.py 256 34 87%
modyn/supervisor/internal/pipeline_executor/pipeline_executor.py 365 18 95%
modyn/supervisor/internal/supervisor.py 144 17 88%
modyn/supervisor/internal/triggers/amounttrigger.py 15 0 100%
modyn/supervisor/internal/triggers/datadrifttrigger.py 102 28 73%
modyn/supervisor/internal/triggers/embedding_encoder_utils/embedding_encoder.py 30 19 37%
modyn/supervisor/internal/triggers/embedding_encoder_utils/embedding_encoder_downloader.py 50 31 38%
modyn/supervisor/internal/triggers/timetrigger.py 26 3 88%
modyn/supervisor/internal/triggers/trigger.py 21 1 95%
modyn/supervisor/internal/triggers/trigger_datasets/dataloader_info.py 16 13 19%
modyn/supervisor/internal/triggers/trigger_datasets/fixed_keys_dataset.py 72 3 96%
modyn/supervisor/internal/triggers/trigger_datasets/online_trigger_dataset.py 17 1 94%
modyn/supervisor/internal/triggers/utils.py 50 37 26%
modyn/supervisor/internal/utils/evaluation_status_reporter.py 31 0 100%
modyn/supervisor/internal/utils/pipeline_info.py 30 9 70%
modyn/supervisor/internal/utils/training_status_reporter.py 24 3 88%
modyn/tests/common/example_extension/test_example_extension.py 13 0 100%
modyn/tests/common/grpc/test_grpc_helpers.py 48 0 100%
modyn/tests/common/trigger_sample/test_trigger_sample_storage.py 128 0 100%
modyn/tests/config/schema/test_pipeline.py 32 0 100%
modyn/tests/config/test_config_integrity.py 36 1 97%
modyn/tests/conftest.py 42 0 100%
modyn/tests/database/test_abstract_database_connection.py 19 0 100%
modyn/tests/evaluator/internal/dataset/test_evaluation_dataset.py 131 2 98%
modyn/tests/evaluator/internal/grpc/test_evaluator_grpc_server.py 20 0 100%
modyn/tests/evaluator/internal/grpc/test_evaluator_grpc_servicer.py 405 16 96%
modyn/tests/evaluator/internal/metrics/test_accuracy.py 62 0 100%
modyn/tests/evaluator/internal/metrics/test_f1_score.py 53 0 100%
modyn/tests/evaluator/internal/metrics/test_roc_auc.py 31 0 100%
modyn/tests/evaluator/internal/test_metric_factory.py 13 0 100%
modyn/tests/evaluator/internal/test_pytorch_evaluator.py 163 19 88%
modyn/tests/evaluator/test_evaluator.py 30 0 100%
modyn/tests/evaluator/test_evaluator_entrypoint.py 21 0 100%
modyn/tests/metadata_database/models/test_pipelines.py 50 0 100%
modyn/tests/metadata_database/models/test_sample_training_metadata.py 40 0 100%
modyn/tests/metadata_database/models/test_selector_state_metadata.py 46 0 100%
modyn/tests/metadata_database/models/test_trained_models.py 48 0 100%
modyn/tests/metadata_database/models/test_trigger_training_metadata.py 38 0 100%
modyn/tests/metadata_database/models/test_triggers.py 33 0 100%
modyn/tests/metadata_database/test_metadata_database_connection.py 47 0 100%
modyn/tests/metadata_processor/internal/grpc/test_metadata_processor_grpc_servicer.py 26 0 100%
modyn/tests/metadata_processor/internal/grpc/test_metadata_processor_server.py 27 0 100%
modyn/tests/metadata_processor/internal/test_metadata_processor_manager.py 42 3 93%
modyn/tests/metadata_processor/processor_strategies/test_abstract_processor_strategy.py 60 0 100%
modyn/tests/metadata_processor/processor_strategies/test_basic_processor_strategy.py 43 0 100%
modyn/tests/metadata_processor/test_metadata_processor.py 22 3 86%
modyn/tests/metadata_processor/test_metadata_processor_entrypoint.py 21 0 100%
modyn/tests/model_storage/internal/grpc/test_model_storage_grpc_server.py 16 0 100%
modyn/tests/model_storage/internal/grpc/test_model_storage_grpc_servicer.py 100 0 100%
modyn/tests/model_storage/internal/storage_strategies/difference_operators/test_sub_difference_operator.py 16 0 100%
modyn/tests/model_storage/internal/storage_strategies/difference_operators/test_xor_difference_operator.py 16 0 100%
modyn/tests/model_storage/internal/storage_strategies/full_model_strategies/test_binary_full_model.py 27 1 96%
modyn/tests/model_storage/internal/storage_strategies/full_model_strategies/test_pytorch_full_model.py 36 1 97%
modyn/tests/model_storage/internal/storage_strategies/incremental_model_strategies/test_weights_difference.py 88 2 98%
modyn/tests/model_storage/internal/test_model_storage_manager.py 217 1 99%
modyn/tests/model_storage/internal/utils/test_model_storage_policy.py 28 0 100%
modyn/tests/model_storage/test_model_storage.py 37 0 100%
modyn/tests/model_storage/test_model_storage_entrypoint.py 21 0 100%
modyn/tests/models/test_bert_tokenizer.py 24 0 100%
modyn/tests/models/test_dlrm.py 46 0 100%
modyn/tests/models/test_dummy.py 8 0 100%
modyn/tests/models/test_embedding_recorder.py 27 0 100%
modyn/tests/models/test_fmownet.py 25 0 100%
modyn/tests/models/test_resnet18.py 22 0 100%
modyn/tests/models/test_resnet50.py 22 0 100%
modyn/tests/models/test_resnet152.py 22 0 100%
modyn/tests/models/test_yearbook_net.py 47 0 100%
modyn/tests/selector/internal/grpc/test_selector_grpc_servicer.py 132 0 100%
modyn/tests/selector/internal/grpc/test_selector_server.py 16 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_abstract_downsampling_strategy.py 14 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_gradnorm_downsampling_strategy.py 14 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_loss_downsampling_strategy.py 18 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_no_downsampling_strategy.py 6 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_rho_loss_downsampling_strategy.py 160 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_rs2_downsampling_strategy.py 18 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_scheduler.py 131 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_abstract_balanced_strategy.py 14 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_empty_presampling_strategy.py 0 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_label_balanced_presampling_strategy.py 165 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_random_no_replacement_presampling_strategy.py 52 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_random_presampling_strategy.py 86 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_trigger_balanced_presampling.py 140 0 100%
modyn/tests/selector/internal/selector_strategies/test_abstract_selection_strategy.py 170 0 100%
modyn/tests/selector/internal/selector_strategies/test_coreset_strategy.py 246 0 100%
modyn/tests/selector/internal/selector_strategies/test_freshness_sampling_strategy.py 300 0 100%
modyn/tests/selector/internal/selector_strategies/test_new_data_strategy.py 500 0 100%
modyn/tests/selector/internal/storage_backend/database/test_database_storage_backend.py 123 0 100%
modyn/tests/selector/internal/storage_backend/local/test_local_storage_backend.py 84 0 100%
modyn/tests/selector/internal/storage_backend/utils.py 16 5 69%
modyn/tests/selector/internal/test_selector_manager.py 148 5 97%
modyn/tests/selector/test_selector.py 95 5 95%
modyn/tests/selector/test_selector_entrypoint.py 25 0 100%
modyn/tests/supervisor/internal/eval_strategies/test_matrix_eval_strategy.py 16 0 100%
modyn/tests/supervisor/internal/eval_strategies/test_offset_eval_strategy.py 8 0 100%
modyn/tests/supervisor/internal/evaluation_result_writer/test_abstract_evaluation_result_writer.py 7 0 100%
modyn/tests/supervisor/internal/evaluation_result_writer/test_json_result_writer.py 16 0 100%
modyn/tests/supervisor/internal/evaluation_result_writer/test_tensorboard_result_writer.py 21 0 100%
modyn/tests/supervisor/internal/grpc/test_supervisor_grpc_server.py 29 1 97%
modyn/tests/supervisor/internal/grpc/test_supervisor_grpc_servicer.py 54 0 100%
modyn/tests/supervisor/internal/pipeline_executor/test_pipeline_executor.py 353 6 98%
modyn/tests/supervisor/internal/test_grpc_handler.py 262 0 100%
modyn/tests/supervisor/internal/test_supervisor.py 179 5 97%
modyn/tests/supervisor/internal/triggers/test_amounttrigger.py 25 0 100%
modyn/tests/supervisor/internal/triggers/test_datadrifttrigger.py 94 1 99%
modyn/tests/supervisor/internal/triggers/test_timetrigger.py 21 0 100%
modyn/tests/supervisor/internal/triggers/test_trigger.py 5 0 100%
modyn/tests/supervisor/internal/triggers/trigger_datasets/test_fixed_keys_dataset.py 123 2 98%
modyn/tests/supervisor/internal/triggers/trigger_datasets/test_online_trigger_dataset.py 28 2 93%
modyn/tests/supervisor/test_entrypoint.py 25 0 100%
modyn/tests/trainer_server/internal/data/key_sources/test_local_key_source.py 89 0 100%
modyn/tests/trainer_server/internal/data/key_sources/test_selector_key_source.py 92 0 100%
modyn/tests/trainer_server/internal/data/test_data_utils.py 22 1 95%
modyn/tests/trainer_server/internal/data/test_local_dataset_writer.py 59 0 100%
modyn/tests/trainer_server/internal/data/test_online_dataset.py 367 5 99%
modyn/tests/trainer_server/internal/data/test_per_class_online_dataset.py 53 3 94%
modyn/tests/trainer_server/internal/grpc/test_trainer_server_grpc_server.py 17 0 100%
modyn/tests/trainer_server/internal/grpc/test_trainer_server_grpc_servicer.py 406 8 98%
modyn/tests/trainer_server/internal/metadata_collector/test_metadata_collector.py 41 0 100%
modyn/tests/trainer_server/internal/trainer/metadata_pytorch_callbacks/test_loss_callback.py 51 1 98%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/deepcore_comparison_tests_utils.py 21 1 95%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_abstract_matrix_downsampling_strategy.py 77 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_abstract_remote_downsampling_strategy.py 12 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_craig_remote_downsampling.py 260 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_get_tensor_subset.py 56 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_gradmatch_downsampling_strategy.py 120 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_gradnorm_downsample.py 96 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_kcenter_downsampling_strategy.py 108 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_loss_downsample.py 86 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_rs2_downsampling.py 123 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_submodular_downsampling_strategy.py 103 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_uncertainty_downsampling_strategy.py 51 0 100%
modyn/tests/trainer_server/internal/trainer/test_batch_accumulator.py 93 0 100%
modyn/tests/trainer_server/internal/trainer/test_pytorch_trainer.py 428 34 92%
modyn/tests/trainer_server/test_trainer_server.py 34 0 100%
modyn/tests/trainer_server/test_trainer_server_entrypoint.py 21 0 100%
modyn/tests/utils/test_timer.py 22 0 100%
modyn/tests/utils/test_utils.py 175 0 100%
modyn/trainer_server/custom_lr_schedulers/dlrm_lr_scheduler/dlrm_scheduler.py 33 33 0%
modyn/trainer_server/internal/dataset/data_utils.py 17 2 88%
modyn/trainer_server/internal/dataset/key_sources/abstract_key_source.py 21 5 76%
modyn/trainer_server/internal/dataset/key_sources/local_key_source.py 23 1 96%
modyn/trainer_server/internal/dataset/key_sources/selector_key_source.py 54 2 96%
modyn/trainer_server/internal/dataset/local_dataset_writer.py 55 3 95%
modyn/trainer_server/internal/dataset/online_dataset.py 308 29 91%
modyn/trainer_server/internal/dataset/per_class_online_dataset.py 14 0 100%
modyn/trainer_server/internal/grpc/trainer_server_grpc_server.py 22 0 100%
modyn/trainer_server/internal/grpc/trainer_server_grpc_servicer.py 244 38 84%
modyn/trainer_server/internal/metadata_collector/metadata_collector.py 33 0 100%
modyn/trainer_server/internal/mocks/mock_metadata_processor.py 22 2 91%
modyn/trainer_server/internal/trainer/batch_accumulator.py 30 0 100%
modyn/trainer_server/internal/trainer/metadata_pytorch_callbacks/base_callback.py 15 3 80%
modyn/trainer_server/internal/trainer/metadata_pytorch_callbacks/loss_callback.py 21 0 100%
modyn/trainer_server/internal/trainer/pytorch_trainer.py 566 164 71%
modyn/trainer_server/internal/trainer/remote_downsamplers/abstract_matrix_downsampling_strategy.py 69 4 94%
modyn/trainer_server/internal/trainer/remote_downsamplers/abstract_per_label_remote_downsample_strategy.py 9 1 89%
modyn/trainer_server/internal/trainer/remote_downsamplers/abstract_remote_downsampling_strategy.py 38 4 89%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/cossim.py 28 17 39%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/euclidean.py 29 12 59%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/k_center_greedy.py 38 4 89%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/orthogonal_matching_pursuit.py 66 34 48%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/shuffling.py 9 0 100%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/submodular_function.py 103 15 85%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/submodular_optimizer.py 116 78 33%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_craig_downsampling.py 98 7 93%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_grad_match_downsampling_strategy.py 16 1 94%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_gradnorm_downsampling.py 46 5 89%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_kcenter_greedy_downsampling_strategy.py 14 0 100%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_loss_downsampling.py 37 5 86%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_rs2_downsampling.py 44 1 98%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_submodular_downsampling_strategy.py 29 3 90%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_uncertainty_downsampling_strategy.py 64 18 72%
modyn/trainer_server/internal/utils/metric_type.py 3 0 100%
modyn/trainer_server/internal/utils/trainer_messages.py 4 0 100%
modyn/trainer_server/internal/utils/training_info.py 46 0 100%
modyn/trainer_server/internal/utils/training_process_info.py 10 0 100%
modyn/trainer_server/trainer_server.py 19 0 100%
modyn/trainer_server/trainer_server_entrypoint.py 32 3 91%
modyn/utils/timer.py 8 0 100%
modyn/utils/utils.py 161 13 92%
TOTAL 18884 1626 91%
Coverage HTML written to
Required test coverage of
=============== 2442 passed, 8078
github-actions[bot] commented 1 month ago

Line Coverage: -% ( % to main) Branch Coverage: -% ( % to main)

MaxiBoether commented 1 month ago

I think what I initially thought this contains is the implementation of the current supported stuff in the main branch, but with the concept of evalhandlers and triggers and strategies. Anyways, we have to figure out the relationships between those concepts first, but it might make sense to have this PR just the existing eval strategies and triggers (if we in some form keep those concepts) and then the follow up PR implements the actual newe strategies and triggers that you need.

robinholzi commented 1 month ago

Thanks @MaxiBoether for raising your concerns about the level of abstraction, there is, indeed, some entanglement between the different dimensions of our evaluation setup.

As you correctly identified, I initially thought that the concepts of “when” to evaluate and “what data” to evaluate are orthogonal. E.g. when triggering at the time of training or at a fixed schedule you could decide to take data some time into future and past of the the trigger time. However, in some cases like the current MatrixEvaluation and the BetweenTwoTriggers evaluation they are dependent. Also, the “when” question is actually overloaded. There is the “when” in terms of “Modyn pipeline time” and “dataset time”.

I think the naming isn’t totally clear and those concepts should be divided up in our setup. However, it’s not straightforward as our experiment pipeline setup also mixes those concepts when accessing future data within one pipeline run for evaluating models on all data slices. During implementation I was probably focusing on the “production case” too much where the evaluation schedule should be more independent from triggering.

Regarding the current MatrixEvaluation there already is a naming ambiguity. Essentially, it should mean that every model is evaluated on every evaluation interval. This is not a strategy (that selects the point in time and data) itself, but rather an option of every strategy/handler. E.g. we could also define the evaluation intervals as the intervals between two triggers and train every model on every of those. The same applies to intervals generated by a fixed periodic schedule or static points in the datasets.

I gave it another thought and propose the following abstraction (I left it close to the config format)

handler: 
  # on modyn pipeline level
  # configures what datasets a certain evaluation series should run on, what intervals and models to combine, ...;
  # this basically reduces the choices in the space `Datasets x Models x Intervals x TimeOfExecutionInPipeline`

  execution_time: Literal[“after_training”, “after_pipeline”, “manual”, “async_during_pipeline”]  
  # when the evaluation handlers should be executed; 
  # "manual": will skip the evaluations in the pipeline run; useful if someone wants to start them with a different entry point later on in isolation 
    # (useful for debugging and when evaluation is not part of the performance critical part of the pipeline, can be executed while the system is busy with other things e.g.)
  # "async_during_pipeline": future feature, not needed atm., useful when the pipeline should perform evaluations asynchronously during the pipeline (e.g. for continously generating data for training trigger decisions)

  models (shared across all strategies): Literal[“matrix”, “most_recent”]
  # what models to evaluate the intervals on; 
  # matrix will evaluate every model on every interval given by the strategy

  strategy: Union[SlicingEvalStrategy, BetweenTwoTriggersEvalStrategy, StaticEvalStrategy, PeriodicStrategy]
  # on dataset level; something that defines start & end timestamp of evaluation requests; 
  # determines both the point in time (dataset) and range of data to start evaluating on;
  # SlicingEvalStrategy: effectively what the old MatrixEvalStrategy did (in terms of interval generation, not in terms of defining the cross product with all models)
  # PeriodicStrategy & StaticEvalStrategy: these strategies have orthogonal "when" (schedule) and "what data" concepts (offsets in potentially both directions of the scheduled times)

  dataset: list[str]
  # list of dataset references defined at top level of evaluation config, to enable running one handler on multiple datasets (e.g. train, test, combined)
  # this makes sense as e.g. the BetweenTwoTriggersEvalStrategy can actually run on the training data.
  # we could default this to use run the handler on every of the datasets in the root setting

If you agree with this abstraction I can adjust the implementation of the config models and business logic (just requires some rearranging of the existing code, no new implementation)