Alcoholrithm / TabularS3L

A PyTorch Lightning-based library for self- and semi-supervised learning on tabular data.
MIT License
27 stars 2 forks source link

Examples of Unlabeled data #16

Closed mdinesh9 closed 2 months ago

mdinesh9 commented 2 months ago

Hi @Alcoholrithm

Example shown the readme seems to be using labeled dataset. By any chance, are there any examples of using the library for unlabeled dataset for example VIME with unlabeled dataset?

Thanks.

Alcoholrithm commented 2 months ago

Hi @mdinesh9

Thank you for your question!

While the example in the README uses a labeled dataset, it's important to note that the first phase learning of VIME is specifically designed for unlabeled datasets. During this phase, both the "X" and "unlabeled_data" parameters of VIMEDataset are treated as the same type of data — unlabeled data.

So, if you only have an unlabeled dataset, you can still perform the self-supervised learning step of VIME. Simply pass your unlabeled data to the "X" parameter and set "unlabeled_data" to None when initializing the VIMEDataset. This will allow the model to learn from the unlabeled data without labeled dataset.

The following code block provides an example for your use case.

### First Phase Learning
train_ds = VIMEDataset(X = X_train, unlabeled_data = None, config=config, continuous_cols = continuous_cols, category_cols = category_cols)
valid_ds = VIMEDataset(X = X_valid, config=config, continuous_cols = continuous_cols, category_cols = category_cols)

datamodule = TS3LDataModule(train_ds, valid_ds, batch_size, train_sampler='random')

trainer = Trainer(
                    accelerator = 'cpu',
                    max_epochs = 20,
                    num_sanity_val_steps = 2,
    )

trainer.fit(pl_vime, datamodule)

I hope this clarifies your question.

mdinesh9 commented 2 months ago

Thank you @Alcoholrithm