Open payalchandak opened 2 weeks ago
The changes in this pull request involve modifications to several configuration files for the meds_torch
project. Key updates include the replacement of task name references from ${data.task_name}
to ${data.dataset.task_name}
across multiple YAML files, enhancing the specificity of dataset-related configurations. Additionally, new configuration files for dataset management have been introduced, while some existing files have been removed. The PytorchDataset
class has undergone structural changes, including enhanced error handling and a simplified initialization process. Overall, these modifications aim to improve the organization and robustness of the dataset management system.
File Path | Change Summary |
---|---|
MIMICIV_TUTORIAL/configs/meds-torch-configs/experiment/eic_mtr.yaml |
Updated tags entry from ${data.task_name} to ${data.dataset.task_name} . |
MIMICIV_TUTORIAL/configs/meds-torch-configs/experiment/text_code_mtr.yaml |
Updated tags entry from ${data.task_name} to ${data.dataset.task_name} . |
MIMICIV_TUTORIAL/configs/meds-torch-configs/experiment/triplet_mtr.yaml |
Updated tags entry from ${data.task_name} to ${data.dataset.task_name} . |
src/meds_torch/configs/data/dataset/multiwindow_pytorch_dataset.yaml |
Added configuration for MultiWindowPytorchDataset , including defaults , _target_ , subject_level_sampling , and raw_windows_fp . |
src/meds_torch/configs/data/dataset/pytorch_dataset.yaml |
Updated _target_ path, added split: train , and modified several path parameters to use ${data.dataset.*} . Removed dataloader section. |
src/meds_torch/configs/data/dataset/random_windows_pytorch_dataset.yaml |
Added new configuration for random window sampling dataset. |
src/meds_torch/configs/data/default.yaml |
Introduced structured setup for dataset management, specifying defaults for training phases. |
src/meds_torch/configs/data/multiwindow_pytorch_dataset.yaml |
Deleted file containing previous multi-window dataset configuration. |
src/meds_torch/configs/data/random_windows_pytorch_dataset.yaml |
Deleted file containing previous random windows dataset configuration. |
src/meds_torch/configs/model/backbone/default.yaml |
Updated vocab_size reference to ${data.dataset.vocab_size} . |
src/meds_torch/configs/model/backbone/eic_transformer_decoder.yaml |
Updated num_tokens reference to ${data.dataset.vocab_size} . |
src/meds_torch/configs/model/backbone/eic_transformer_encoder.yaml |
Updated num_tokens reference to ${data.dataset.vocab_size} . |
src/meds_torch/configs/model/backbone/eic_transformer_encoder_attn_avg.yaml |
Updated num_tokens reference to ${data.dataset.vocab_size} . |
src/meds_torch/configs/model/backbone/triplet_transformer_decoder.yaml |
Updated num_tokens reference to ${data.dataset.vocab_size} . |
src/meds_torch/configs/model/backbone/triplet_transformer_encoder.yaml |
Updated num_tokens reference to ${data.dataset.tokenizer} . |
src/meds_torch/configs/model/backbone/triplet_transformer_encoder_attn_avg.yaml |
Updated num_tokens reference to ${data.dataset.vocab_size} . |
src/meds_torch/configs/model/ebcl.yaml |
Updated _resolved_max_seq_len , vocab_size , and task_name references to data.dataset.* . |
src/meds_torch/configs/model/eic_forecasting.yaml |
Updated _resolved_max_seq_len , vocab_size , and task_name references to data.dataset.* . |
src/meds_torch/configs/model/input_encoder/eic_encoder.yaml |
Updated vocab_size reference to ${data.dataset.vocab_size} . |
src/meds_torch/configs/model/input_encoder/text_code_encoder.yaml |
Updated several parameters to reference data.dataset.* . |
src/meds_torch/configs/model/input_encoder/triplet_encoder.yaml |
Updated vocab_size reference to ${data.dataset.vocab_size} . |
src/meds_torch/configs/model/input_encoder/triplet_prompt_encoder.yaml |
Updated vocab_size reference to ${data.dataset.vocab_size} . |
src/meds_torch/configs/model/ocp.yaml |
Updated _resolved_max_seq_len , vocab_size , and task_name references to data.dataset.* . |
src/meds_torch/configs/model/supervised.yaml |
Updated _resolved_max_seq_len , vocab_size , and task_name references to data.dataset.* . |
src/meds_torch/configs/model/triplet_forecasting.yaml |
Updated _resolved_max_seq_len , vocab_size , and task_name references to data.dataset.* . |
src/meds_torch/configs/model/value_forecasting.yaml |
Updated _resolved_max_seq_len , vocab_size , and task_name references to data.dataset.* . |
src/meds_torch/configs/train.yaml |
Updated data from pytorch_dataset to default and model from supervised to triplet_forecasting . |
src/meds_torch/data/components/pytorch_dataset.py |
Updated class to inherit from Module , simplified constructor, and enhanced error handling. |
src/meds_torch/data/datamodule.py |
Removed get_dataset method and simplified dataset initialization in the constructor. |
🐇 In the meadow, changes bloom,
Tags now point where datasets loom.
Configs refined, paths align,
For every task, the data will shine.
With each update, the code grows bright,
Hop along, it feels just right! 🌼
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?
Summary by CodeRabbit
Release Notes
New Features
multiwindow_pytorch_dataset.yaml
andrandom_windows_pytorch_dataset.yaml
.default.yaml
for dataset management, specifying datasets for various training phases.Improvements
dataset
structure for parameters likevocab_size
,task_name
, and_resolved_max_seq_len
, improving clarity and organization.PytorchDataset
class for better dataset processing.Bug Fixes
MEDSDataModule
class.Configuration Changes