eth-easl / modyn

Modyn is a research-platform for training ML models on growing datasets.
MIT License
25 stars 3 forks source link

feat: Shuffle between epochs #456

Closed MaxiBoether closed 3 months ago

MaxiBoether commented 3 months ago

This PR introduces a shuffle option for training: If True, then we shuffle the order of the partitions and the keys within the partitions between each epoch.

Note that as described in #460, we might need to have this a bit more finegrained for things like Criteo to optimize performance.

github-actions[bot] commented 3 months ago

Line Coverage: -% ( % to main) Branch Coverage: -% ( % to main)

github-actions[bot] commented 3 months ago

:white_check_mark: Result of Pytest Coverage

---------- coverage: platform linux, python 3.12.3-final-0 ----------- Name Stmts Miss Cover
modyn/common/benchmark/stopwatch.py 26 0 100%
modyn/common/example_extension/example_extension.py 28 2 93%
modyn/common/ftp/ftp_server.py 31 18 42%
modyn/common/ftp/ftp_utils.py 83 69 17%
modyn/common/grpc/grpc_helpers.py 67 36 46%
modyn/common/trigger_sample/trigger_sample_storage.py 158 9 94%
modyn/config/schema/config.py 93 0 100%
modyn/config/schema/modyn_base_model.py 5 0 100%
modyn/config/schema/pipeline.py 245 20 92%
modyn/config/schema/sampling/downsampling_config.py 50 1 98%
modyn/database/abstract_database_connection.py 35 0 100%
modyn/database/partition_by_meta.py 33 12 64%
modyn/evaluator/evaluator.py 15 0 100%
modyn/evaluator/evaluator_entrypoint.py 32 3 91%
modyn/evaluator/internal/dataset/evaluation_dataset.py 75 3 96%
modyn/evaluator/internal/grpc/evaluator_grpc_server.py 22 0 100%
modyn/evaluator/internal/grpc/evaluator_grpc_servicer.py 165 14 92%
modyn/evaluator/internal/metric_factory.py 18 1 94%
modyn/evaluator/internal/metrics/abstract_decomposable_metric.py 10 1 90%
modyn/evaluator/internal/metrics/abstract_evaluation_metric.py 29 2 93%
modyn/evaluator/internal/metrics/abstract_holistic_metric.py 10 1 90%
modyn/evaluator/internal/metrics/accuracy.py 20 2 90%
modyn/evaluator/internal/metrics/f1_score.py 63 0 100%
modyn/evaluator/internal/metrics/roc_auc.py 36 1 97%
modyn/evaluator/internal/pytorch_evaluator.py 113 28 75%
modyn/evaluator/internal/utils/evaluation_info.py 9 0 100%
modyn/evaluator/internal/utils/evaluation_process_info.py 8 0 100%
modyn/evaluator/internal/utils/evaluator_messages.py 3 0 100%
modyn/metadata_database/metadata_base.py 3 0 100%
modyn/metadata_database/metadata_database_connection.py 55 3 95%
modyn/metadata_database/models/pipelines.py 22 1 95%
modyn/metadata_database/models/sample_training_metadata.py 15 0 100%
modyn/metadata_database/models/selector_state_metadata.py 47 10 79%
modyn/metadata_database/models/trained_models.py 18 0 100%
modyn/metadata_database/models/trigger_partitions.py 10 0 100%
modyn/metadata_database/models/trigger_training_metadata.py 14 0 100%
modyn/metadata_database/models/triggers.py 10 0 100%
modyn/metadata_database/utils/model_storage_strategy_config.py 21 2 90%
modyn/metadata_processor/internal/grpc/metadata_processor_grpc_servicer.py 18 0 100%
modyn/metadata_processor/internal/grpc/metadata_processor_server.py 24 0 100%
modyn/metadata_processor/internal/metadata_processor_manager.py 23 4 83%
modyn/metadata_processor/metadata_processor.py 11 0 100%
modyn/metadata_processor/metadata_processor_entrypoint.py 24 1 96%
modyn/metadata_processor/processor_strategies/abstract_processor_strategy.py 30 0 100%
modyn/metadata_processor/processor_strategies/basic_processor_strategy.py 17 2 88%
modyn/metadata_processor/processor_strategies/processor_strategy_type.py 6 1 83%
modyn/model_storage/internal/grpc/grpc_server.py 23 0 100%
modyn/model_storage/internal/grpc/model_storage_grpc_servicer.py 54 0 100%
modyn/model_storage/internal/model_storage_manager.py 118 5 96%
modyn/model_storage/internal/storage_strategies/abstract_difference_operator.py 11 2 82%
modyn/model_storage/internal/storage_strategies/abstract_model_storage_strategy.py 16 1 94%
modyn/model_storage/internal/storage_strategies/difference_operators/sub_difference_operator.py 12 0 100%
modyn/model_storage/internal/storage_strategies/difference_operators/xor_difference_operator.py 14 0 100%
modyn/model_storage/internal/storage_strategies/full_model_strategies/abstract_full_model_strategy.py 26 2 92%
modyn/model_storage/internal/storage_strategies/full_model_strategies/binary_full_model.py 16 0 100%
modyn/model_storage/internal/storage_strategies/full_model_strategies/pytorch_full_model.py 15 0 100%
modyn/model_storage/internal/storage_strategies/incremental_model_strategies/abstract_incremental_model_strategy.py 26 10 62%
modyn/model_storage/internal/storage_strategies/incremental_model_strategies/weights_difference.py 99 1 99%
modyn/model_storage/internal/utils/model_storage_policy.py 35 0 100%
modyn/model_storage/model_storage.py 27 3 89%
modyn/model_storage/model_storage_entrypoint.py 32 3 91%
modyn/models/articlenet/articlenet.py 30 16 47%
modyn/models/coreset_methods_support.py 29 1 97%
modyn/models/dlrm/cuda_ext/dot_based_interact.py 24 13 46%
modyn/models/dlrm/cuda_ext/fused_gather_embedding.py 16 16 0%
modyn/models/dlrm/cuda_ext/sparse_embedding.py 32 32 0%
modyn/models/dlrm/dlrm.py 67 9 87%
modyn/models/dlrm/nn/embeddings.py 123 64 48%
modyn/models/dlrm/nn/factories.py 24 9 62%
modyn/models/dlrm/nn/interactions.py 50 11 78%
modyn/models/dlrm/nn/mlps.py 77 23 70%
modyn/models/dlrm/nn/parts.py 60 4 93%
modyn/models/dlrm/setup.py 5 5 0%
modyn/models/dlrm/utils/install_lib.py 11 7 36%
modyn/models/dlrm/utils/utils.py 28 0 100%
modyn/models/dummy/dummy.py 12 0 100%
modyn/models/fmownet/fmownet.py 25 0 100%
modyn/models/resnet18/resnet18.py 28 0 100%
modyn/models/resnet50/resnet50.py 28 0 100%
modyn/models/resnet152/resnet152.py 28 0 100%
modyn/models/tokenizers/distill_bert_tokenizer.py 11 0 100%
modyn/models/yearbooknet/yearbooknet.py 23 0 100%
modyn/selector/internal/grpc/selector_grpc_servicer.py 78 22 72%
modyn/selector/internal/grpc/selector_server.py 33 12 64%
modyn/selector/internal/selector_manager.py 125 37 70%
modyn/selector/internal/selector_strategies/abstract_selection_strategy.py 125 8 94%
modyn/selector/internal/selector_strategies/coreset_strategy.py 66 6 91%
modyn/selector/internal/selector_strategies/downsampling_strategies/abstract_downsampling_strategy.py 29 0 100%
modyn/selector/internal/selector_strategies/downsampling_strategies/craig_downsampling_strategy.py 18 12 33%
modyn/selector/internal/selector_strategies/downsampling_strategies/downsampling_scheduler.py 51 0 100%
modyn/selector/internal/selector_strategies/downsampling_strategies/gradmatch_downsampling_strategy.py 14 8 43%
modyn/selector/internal/selector_strategies/downsampling_strategies/gradnorm_downsampling_strategy.py 6 0 100%
modyn/selector/internal/selector_strategies/downsampling_strategies/kcentergreedy_downsampling_strategy.py 14 8 43%
modyn/selector/internal/selector_strategies/downsampling_strategies/loss_downsampling_strategy.py 6 0 100%
modyn/selector/internal/selector_strategies/downsampling_strategies/no_downsampling_strategy.py 10 0 100%
modyn/selector/internal/selector_strategies/downsampling_strategies/rho_loss_downsampling_strategy.py 36 6 83%
modyn/selector/internal/selector_strategies/downsampling_strategies/submodular_downsampling_strategy.py 20 14 30%
modyn/selector/internal/selector_strategies/downsampling_strategies/uncertainty_downsampling_strategy.py 15 9 40%
modyn/selector/internal/selector_strategies/downsampling_strategies/utils.py 7 0 100%
modyn/selector/internal/selector_strategies/freshness_sampling_strategy.py 130 12 91%
modyn/selector/internal/selector_strategies/new_data_strategy.py 98 10 90%
modyn/selector/internal/selector_strategies/presampling_strategies/abstract_balanced_strategy.py 57 0 100%
modyn/selector/internal/selector_strategies/presampling_strategies/abstract_presampling_strategy.py 23 1 96%
modyn/selector/internal/selector_strategies/presampling_strategies/label_balanced_presampling_strategy.py 7 0 100%
modyn/selector/internal/selector_strategies/presampling_strategies/no_presampling_strategy.py 16 1 94%
modyn/selector/internal/selector_strategies/presampling_strategies/random_no_replacement_presampling_strategy.py 42 0 100%
modyn/selector/internal/selector_strategies/presampling_strategies/random_presampling_strategy.py 17 0 100%
modyn/selector/internal/selector_strategies/presampling_strategies/trigger_balanced_presampling_strategy.py 13 1 92%
modyn/selector/internal/selector_strategies/presampling_strategies/utils.py 9 0 100%
modyn/selector/internal/selector_strategies/utils.py 10 0 100%
modyn/selector/internal/storage_backend/abstract_storage_backend.py 34 7 79%
modyn/selector/internal/storage_backend/database/database_storage_backend.py 85 7 92%
modyn/selector/internal/storage_backend/local/local_storage_backend.py 136 5 96%
modyn/selector/selector.py 82 14 83%
modyn/selector/selector_entrypoint.py 31 3 90%
modyn/supervisor/entrypoint.py 31 3 90%
modyn/supervisor/internal/eval_strategies/abstract_eval_strategy.py 8 1 88%
modyn/supervisor/internal/eval_strategies/matrix_eval_strategy.py 17 0 100%
modyn/supervisor/internal/eval_strategies/offset_eval_strategy.py 22 0 100%
modyn/supervisor/internal/evaluation_result_writer/abstract_evaluation_result_writer.py 16 2 88%
modyn/supervisor/internal/evaluation_result_writer/json_result_writer.py 23 1 96%
modyn/supervisor/internal/evaluation_result_writer/tensorboard_result_writer.py 13 0 100%
modyn/supervisor/internal/grpc/enums.py 55 0 100%
modyn/supervisor/internal/grpc/supervisor_grpc_server.py 25 7 72%
modyn/supervisor/internal/grpc/supervisor_grpc_servicer.py 35 0 100%
modyn/supervisor/internal/grpc/template_msg.py 26 0 100%
modyn/supervisor/internal/grpc_handler.py 301 36 88%
modyn/supervisor/internal/pipeline_executor/models.py 256 34 87%
modyn/supervisor/internal/pipeline_executor/pipeline_executor.py 361 18 95%
modyn/supervisor/internal/supervisor.py 144 17 88%
modyn/supervisor/internal/triggers/amounttrigger.py 15 0 100%
modyn/supervisor/internal/triggers/datadrifttrigger.py 102 28 73%
modyn/supervisor/internal/triggers/embedding_encoder_utils/embedding_encoder.py 30 19 37%
modyn/supervisor/internal/triggers/embedding_encoder_utils/embedding_encoder_downloader.py 50 31 38%
modyn/supervisor/internal/triggers/timetrigger.py 26 3 88%
modyn/supervisor/internal/triggers/trigger.py 21 1 95%
modyn/supervisor/internal/triggers/trigger_datasets/dataloader_info.py 16 13 19%
modyn/supervisor/internal/triggers/trigger_datasets/fixed_keys_dataset.py 72 3 96%
modyn/supervisor/internal/triggers/trigger_datasets/online_trigger_dataset.py 17 1 94%
modyn/supervisor/internal/triggers/utils.py 50 37 26%
modyn/supervisor/internal/utils/evaluation_status_reporter.py 31 0 100%
modyn/supervisor/internal/utils/pipeline_info.py 30 9 70%
modyn/supervisor/internal/utils/training_status_reporter.py 24 3 88%
modyn/tests/common/example_extension/test_example_extension.py 13 0 100%
modyn/tests/common/grpc/test_grpc_helpers.py 3 0 100%
modyn/tests/common/trigger_sample/test_trigger_sample_storage.py 128 0 100%
modyn/tests/config/schema/test_pipeline.py 35 0 100%
modyn/tests/config/test_config_integrity.py 36 1 97%
modyn/tests/conftest.py 39 0 100%
modyn/tests/database/test_abstract_database_connection.py 19 0 100%
modyn/tests/evaluator/internal/dataset/test_evaluation_dataset.py 131 2 98%
modyn/tests/evaluator/internal/grpc/test_evaluator_grpc_server.py 20 0 100%
modyn/tests/evaluator/internal/grpc/test_evaluator_grpc_servicer.py 365 16 96%
modyn/tests/evaluator/internal/metrics/test_accuracy.py 45 0 100%
modyn/tests/evaluator/internal/metrics/test_f1_score.py 53 0 100%
modyn/tests/evaluator/internal/metrics/test_roc_auc.py 31 0 100%
modyn/tests/evaluator/internal/test_metric_factory.py 13 0 100%
modyn/tests/evaluator/internal/test_pytorch_evaluator.py 163 19 88%
modyn/tests/evaluator/test_evaluator.py 30 0 100%
modyn/tests/evaluator/test_evaluator_entrypoint.py 21 0 100%
modyn/tests/metadata_database/models/test_pipelines.py 48 0 100%
modyn/tests/metadata_database/models/test_sample_training_metadata.py 40 0 100%
modyn/tests/metadata_database/models/test_selector_state_metadata.py 46 0 100%
modyn/tests/metadata_database/models/test_trained_models.py 48 0 100%
modyn/tests/metadata_database/models/test_trigger_training_metadata.py 38 0 100%
modyn/tests/metadata_database/models/test_triggers.py 33 0 100%
modyn/tests/metadata_database/test_metadata_database_connection.py 47 0 100%
modyn/tests/metadata_processor/internal/grpc/test_metadata_processor_grpc_servicer.py 26 0 100%
modyn/tests/metadata_processor/internal/grpc/test_metadata_processor_server.py 27 0 100%
modyn/tests/metadata_processor/internal/test_metadata_processor_manager.py 42 3 93%
modyn/tests/metadata_processor/processor_strategies/test_abstract_processor_strategy.py 60 0 100%
modyn/tests/metadata_processor/processor_strategies/test_basic_processor_strategy.py 43 0 100%
modyn/tests/metadata_processor/test_metadata_processor.py 22 3 86%
modyn/tests/metadata_processor/test_metadata_processor_entrypoint.py 21 0 100%
modyn/tests/model_storage/internal/grpc/test_model_storage_grpc_server.py 16 0 100%
modyn/tests/model_storage/internal/grpc/test_model_storage_grpc_servicer.py 100 0 100%
modyn/tests/model_storage/internal/storage_strategies/difference_operators/test_sub_difference_operator.py 16 0 100%
modyn/tests/model_storage/internal/storage_strategies/difference_operators/test_xor_difference_operator.py 16 0 100%
modyn/tests/model_storage/internal/storage_strategies/full_model_strategies/test_binary_full_model.py 27 1 96%
modyn/tests/model_storage/internal/storage_strategies/full_model_strategies/test_pytorch_full_model.py 36 1 97%
modyn/tests/model_storage/internal/storage_strategies/incremental_model_strategies/test_weights_difference.py 88 2 98%
modyn/tests/model_storage/internal/test_model_storage_manager.py 217 1 99%
modyn/tests/model_storage/internal/utils/test_model_storage_policy.py 28 0 100%
modyn/tests/model_storage/test_model_storage.py 37 0 100%
modyn/tests/model_storage/test_model_storage_entrypoint.py 21 0 100%
modyn/tests/models/test_bert_tokenizer.py 24 0 100%
modyn/tests/models/test_dlrm.py 46 0 100%
modyn/tests/models/test_dummy.py 8 0 100%
modyn/tests/models/test_embedding_recorder.py 27 0 100%
modyn/tests/models/test_fmownet.py 25 0 100%
modyn/tests/models/test_resnet18.py 22 0 100%
modyn/tests/models/test_resnet50.py 22 0 100%
modyn/tests/models/test_resnet152.py 22 0 100%
modyn/tests/models/test_yearbook_net.py 47 0 100%
modyn/tests/selector/internal/grpc/test_selector_grpc_servicer.py 132 0 100%
modyn/tests/selector/internal/grpc/test_selector_server.py 16 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_abstract_downsampling_strategy.py 14 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_gradnorm_downsampling_strategy.py 14 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_loss_downsampling_strategy.py 18 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_no_downsampling_strategy.py 6 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_rho_loss_downsampling_strategy.py 68 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_scheduler.py 131 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_abstract_balanced_strategy.py 14 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_empty_presampling_strategy.py 0 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_label_balanced_presampling_strategy.py 165 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_random_no_replacement_presampling_strategy.py 52 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_random_presampling_strategy.py 86 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_trigger_balanced_presampling.py 140 0 100%
modyn/tests/selector/internal/selector_strategies/test_abstract_selection_strategy.py 170 0 100%
modyn/tests/selector/internal/selector_strategies/test_coreset_strategy.py 246 0 100%
modyn/tests/selector/internal/selector_strategies/test_freshness_sampling_strategy.py 300 0 100%
modyn/tests/selector/internal/selector_strategies/test_new_data_strategy.py 500 0 100%
modyn/tests/selector/internal/storage_backend/database/test_database_storage_backend.py 123 0 100%
modyn/tests/selector/internal/storage_backend/local/test_local_storage_backend.py 84 0 100%
modyn/tests/selector/internal/storage_backend/utils.py 16 5 69%
modyn/tests/selector/internal/test_selector_manager.py 148 5 97%
modyn/tests/selector/test_selector.py 95 5 95%
modyn/tests/selector/test_selector_entrypoint.py 25 0 100%
modyn/tests/supervisor/internal/eval_strategies/test_matrix_eval_strategy.py 16 0 100%
modyn/tests/supervisor/internal/eval_strategies/test_offset_eval_strategy.py 8 0 100%
modyn/tests/supervisor/internal/evaluation_result_writer/test_abstract_evaluation_result_writer.py 7 0 100%
modyn/tests/supervisor/internal/evaluation_result_writer/test_json_result_writer.py 16 0 100%
modyn/tests/supervisor/internal/evaluation_result_writer/test_tensorboard_result_writer.py 21 0 100%
modyn/tests/supervisor/internal/grpc/test_supervisor_grpc_server.py 29 1 97%
modyn/tests/supervisor/internal/grpc/test_supervisor_grpc_servicer.py 54 0 100%
modyn/tests/supervisor/internal/pipeline_executor/test_pipeline_executor.py 348 6 98%
modyn/tests/supervisor/internal/test_grpc_handler.py 287 0 100%
modyn/tests/supervisor/internal/test_supervisor.py 179 5 97%
modyn/tests/supervisor/internal/triggers/test_amounttrigger.py 25 0 100%
modyn/tests/supervisor/internal/triggers/test_datadrifttrigger.py 94 1 99%
modyn/tests/supervisor/internal/triggers/test_timetrigger.py 21 0 100%
modyn/tests/supervisor/internal/triggers/test_trigger.py 5 0 100%
modyn/tests/supervisor/internal/triggers/trigger_datasets/test_fixed_keys_dataset.py 123 2 98%
modyn/tests/supervisor/internal/triggers/trigger_datasets/test_online_trigger_dataset.py 28 2 93%
modyn/tests/supervisor/test_entrypoint.py 25 0 100%
modyn/tests/trainer_server/internal/data/key_sources/test_local_key_source.py 89 0 100%
modyn/tests/trainer_server/internal/data/key_sources/test_selector_key_source.py 92 0 100%
modyn/tests/trainer_server/internal/data/test_data_utils.py 22 1 95%
modyn/tests/trainer_server/internal/data/test_local_dataset_writer.py 59 0 100%
modyn/tests/trainer_server/internal/data/test_online_dataset.py 367 5 99%
modyn/tests/trainer_server/internal/data/test_per_class_online_dataset.py 53 3 94%
modyn/tests/trainer_server/internal/grpc/test_trainer_server_grpc_server.py 17 0 100%
modyn/tests/trainer_server/internal/grpc/test_trainer_server_grpc_servicer.py 406 8 98%
modyn/tests/trainer_server/internal/metadata_collector/test_metadata_collector.py 41 0 100%
modyn/tests/trainer_server/internal/trainer/metadata_pytorch_callbacks/test_loss_callback.py 51 1 98%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/deepcore_comparison_tests_utils.py 21 1 95%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_abstract_matrix_downsampling_strategy.py 75 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_abstract_remote_downsampling_strategy.py 12 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_craig_remote_downsampling.py 249 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_get_tensor_subset.py 56 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_gradmatch_downsampling_strategy.py 116 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_gradnorm_downsample.py 92 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_kcenter_downsampling_strategy.py 104 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_loss_downsample.py 82 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_submodular_downsampling_strategy.py 101 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_uncertainty_downsampling_strategy.py 49 0 100%
modyn/tests/trainer_server/internal/trainer/test_batch_accumulator.py 93 0 100%
modyn/tests/trainer_server/internal/trainer/test_pytorch_trainer.py 412 34 92%
modyn/tests/trainer_server/test_trainer_server.py 34 0 100%
modyn/tests/trainer_server/test_trainer_server_entrypoint.py 21 0 100%
modyn/tests/utils/test_timer.py 22 0 100%
modyn/tests/utils/test_utils.py 175 0 100%
modyn/trainer_server/custom_lr_schedulers/dlrm_lr_scheduler/dlrm_scheduler.py 33 33 0%
modyn/trainer_server/internal/dataset/data_utils.py 17 2 88%
modyn/trainer_server/internal/dataset/key_sources/abstract_key_source.py 21 5 76%
modyn/trainer_server/internal/dataset/key_sources/local_key_source.py 23 1 96%
modyn/trainer_server/internal/dataset/key_sources/selector_key_source.py 54 2 96%
modyn/trainer_server/internal/dataset/local_dataset_writer.py 55 3 95%
modyn/trainer_server/internal/dataset/online_dataset.py 308 29 91%
modyn/trainer_server/internal/dataset/per_class_online_dataset.py 14 0 100%
modyn/trainer_server/internal/grpc/trainer_server_grpc_server.py 22 0 100%
modyn/trainer_server/internal/grpc/trainer_server_grpc_servicer.py 244 38 84%
modyn/trainer_server/internal/metadata_collector/metadata_collector.py 33 0 100%
modyn/trainer_server/internal/mocks/mock_metadata_processor.py 22 2 91%
modyn/trainer_server/internal/trainer/batch_accumulator.py 30 0 100%
modyn/trainer_server/internal/trainer/metadata_pytorch_callbacks/base_callback.py 15 3 80%
modyn/trainer_server/internal/trainer/metadata_pytorch_callbacks/loss_callback.py 21 0 100%
modyn/trainer_server/internal/trainer/pytorch_trainer.py 513 150 71%
modyn/trainer_server/internal/trainer/remote_downsamplers/abstract_matrix_downsampling_strategy.py 66 4 94%
modyn/trainer_server/internal/trainer/remote_downsamplers/abstract_per_label_remote_downsample_strategy.py 9 1 89%
modyn/trainer_server/internal/trainer/remote_downsamplers/abstract_remote_downsampling_strategy.py 32 3 91%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/cossim.py 28 17 39%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/euclidean.py 29 12 59%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/k_center_greedy.py 38 4 89%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/orthogonal_matching_pursuit.py 66 34 48%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/shuffling.py 9 0 100%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/submodular_function.py 103 15 85%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/submodular_optimizer.py 116 78 33%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_craig_downsampling.py 95 7 93%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_grad_match_downsampling_strategy.py 17 1 94%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_gradnorm_downsampling.py 42 5 88%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_kcenter_greedy_downsampling_strategy.py 15 0 100%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_loss_downsampling.py 34 5 85%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_submodular_downsampling_strategy.py 30 3 90%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_uncertainty_downsampling_strategy.py 61 18 70%
modyn/trainer_server/internal/utils/metric_type.py 3 0 100%
modyn/trainer_server/internal/utils/trainer_messages.py 4 0 100%
modyn/trainer_server/internal/utils/training_info.py 53 2 96%
modyn/trainer_server/internal/utils/training_process_info.py 10 0 100%
modyn/trainer_server/trainer_server.py 19 0 100%
modyn/trainer_server/trainer_server_entrypoint.py 32 3 91%
modyn/utils/timer.py 8 0 100%
modyn/utils/utils.py 161 13 92%
TOTAL 18161 1581 91%
Coverage HTML written to
Required test coverage of
=============== 2426 passed, 8078
MaxiBoether commented 3 months ago

This touches a few files but mostly propagating the shuffle bool through Modyn. The logic changes are mostly in the OnlineDataset and the selector gRPC servicer. Thanks already for the review :) Locally, the integrationtests worked - I hoped now it runs through in CI as well.

MaxiBoether commented 3 months ago

re your questions in the main comment:

1) the problem is that the transfer is implemented in a streaming fashion. this means we cannot shuffle on the receiving side (without implementing buffering logic). hence, we have to shuffle at the sending site before we start sending.

2) I don't get your comment yet. In line 345, shuffling is implemented for _fetch_partition_noprefetch, and in line 317, for _prefetch_partition. Did you maybe miss one of those lines?

MaxiBoether commented 3 months ago

@XianzheMa: I will have to remove shuffle from the selector again. The problem with shuffling the keys is that the storage implementation sorts the request by associated file ids, which makes the order returned by the storage somewhat predictable. So basically, we can either just shuffle on a partition level, but then this means that if the partition size is too big, there is no effect from shuffling. Right now we stream partition data as soon as we have it:

        while not self._is_partition_fetched(partition_id):
            max_idx = self._partition_max_index(partition_id)
            if max_idx <= last_idx:  # No new data
                self._wait_for_new_partition_data(partition_id)

            yield from self._get_partition_data(last_idx, max_idx, partition_id)
            last_idx = max_idx

probably I will have to change the logic such that if shuffling is enabled, we wait for the entire partition to be transferred and then shuffle it. i hope this will not affect throughput too much. at some point i can measure this, but it will mostly affect things like criteo. i will implement this tomorrow morning such that we can then merge this PR and you can use the shuffling functionality. sorry i did not realize that shuffling is not straightforward due to the storage implementation

XianzheMa commented 3 months ago

@XianzheMa: I will have to remove shuffle from the selector again. The problem with shuffling the keys is that the storage implementation sorts the request by associated file ids, which makes the order returned by the storage somewhat predictable. So basically, we can either just shuffle on a partition level, but then this means that if the partition size is too big, there is no effect from shuffling. Right now we stream partition data as soon as we have it:

        while not self._is_partition_fetched(partition_id):
            max_idx = self._partition_max_index(partition_id)
            if max_idx <= last_idx:  # No new data
                self._wait_for_new_partition_data(partition_id)

            yield from self._get_partition_data(last_idx, max_idx, partition_id)
            last_idx = max_idx

probably I will have to change the logic such that if shuffling is enabled, we wait for the entire partition to be transferred and then shuffle it. i hope this will not affect throughput too much. at some point i can measure this, but it will mostly affect things like criteo. i will implement this tomorrow morning such that we can then merge this PR and you can use the shuffling functionality. sorry i did not realize that shuffling is not straightforward due to the storage implementation

I see. So even if we shuffle the keys, the storage component still return samples not in the key order, but in timestamp order, is this correct?

Do you remember why we want to sort samples in the timestamp order in the first place (instead of naturally returning samples in key order)?

MaxiBoether commented 3 months ago

@XianzheMa: I will have to remove shuffle from the selector again. The problem with shuffling the keys is that the storage implementation sorts the request by associated file ids, which makes the order returned by the storage somewhat predictable. So basically, we can either just shuffle on a partition level, but then this means that if the partition size is too big, there is no effect from shuffling. Right now we stream partition data as soon as we have it:

        while not self._is_partition_fetched(partition_id):
            max_idx = self._partition_max_index(partition_id)
            if max_idx <= last_idx:  # No new data
                self._wait_for_new_partition_data(partition_id)

            yield from self._get_partition_data(last_idx, max_idx, partition_id)
            last_idx = max_idx

probably I will have to change the logic such that if shuffling is enabled, we wait for the entire partition to be transferred and then shuffle it. i hope this will not affect throughput too much. at some point i can measure this, but it will mostly affect things like criteo. i will implement this tomorrow morning such that we can then merge this PR and you can use the shuffling functionality. sorry i did not realize that shuffling is not straightforward due to the storage implementation

I see. So even if we shuffle the keys, the storage component still return samples not in the key order, but in timestamp order, is this correct?

Do you remember why we want to sort samples in the timestamp order in the first place (instead of naturally returning samples in key order)?

No, we don't sort by timestamp: https://github.com/eth-easl/modyn/blob/028a646364a3360bdfd8355b986d8d1c9ed76cd1/modyn/storage/include/internal/grpc/storage_service_impl.hpp#L525

That function implements the sending of results. For performance reasons, we iterate over files (not by timestamp!) in a request. This is because if one file contains many samples, we just want to open and load it into memory once and then read the data.