Oufattole / meds-torch

MIT License

11 stars 1 forks source link

Implementation of a Masking Stage with Random Masking Options #26

Open Oufattole opened 1 month ago

Oufattole commented 1 month ago

Implementation of a Masking Stage with Random Masking Options

Problem

The absence of a dedicated masking stage in our pipeline limits our ability to handle incomplete or noisy data effectively during model training.

Proposed Solution

Introduce a masking stage designed to randomly mask a specified percentage of the data or subsequences within the data:

Position: Place the masking stage after the input encoder and before the sequence model.
Functionality:
- Support random masking, either a random percentage of the tokens are masked or a randomly sampled continuous subsequence is masked.
- We should add to the batch a key indicating the labels that will be used by the Model stage to compute masked imputation loss.
Configurability: Allow users to set the percentage of data to mask.

Oufattole commented 1 month ago

The token loss will be the same as for forecasting: https://github.com/Oufattole/meds-torch/blob/52ed2fbac72fb72a6f9491339cded5ba10354a2f/src/meds_torch/models/token_forecasting.py#L138

Lemme know what you think @teyaberg