KamWithK / PyParquetLoaders

Easy, efficient and Pythonic data loading of Parquet files for PyTorch-based libraries
MIT License
22 stars 0 forks source link

using as stylegan2 dataloader #2

Open MationPlays opened 2 years ago

MationPlays commented 2 years ago

Hello, I want to use parquet files with the stylegan2-ada-pytorch implementation. Do I have to implement stylegan2 first in lighning module so I can use this dataloader? I never used Lightning and the repo structure of stylegan2 is not that easy. Actually I just want the stylegan to accept parquet as datasource without using petastorm

MationPlays commented 2 years ago

I found this repo https://github.com/awaelchli/stylegan2-pytorch-lightning/blob/master/train.py stylegan in lightning. Can I just in the dataloader position in the train.py?

Code

  tokenizer = AutoTokenizer.from_pretrained("bert-base-cased", fast=True)
 def process_rows(columns_dict):
     tokens = tokenizer(columns_dict["readme"], padding=True, truncation=True, return_tensors="pt").data
     columns_dict.update(tokens)

# Remove unwanted columns
[columns_dict.pop(column) for column in dict(columns_dict) if column not in ["token_type_ids", "attention_mask", "input_ids", "target"]]
return columns_dict

def train_dataloader(self):
    args = self.hparams
    transform = transforms.Compose(
        [
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5), inplace=True),
        ]
    )

    #ADDED PARQUET DATALOADER CODE
            dataset = IterableParquetDataset("example.parquet", process_rows)
            dataloader = DataLoader(dataset, num_workers=4)

    #dataset = MultiResolutionDataset(args.path, transform, args.size)
    #dataloader = data.DataLoader(
    #    dataset,
    #    shuffle=True,
    #    batch_size=args.batch_size,
    #    drop_last=True,
    #    num_workers=args.num_workers,
    #)
    return dataloader