Hangs with PyTorch data loaders when `num_workers > 0`

OS: Ubuntu 22.04 Python version: 3.11.8 PyTorch version: 2.2.1 Tokenmonster package version: 1.1.12 Other libraries: lightning==2.2.1, datasets==2.18.0

Like in the title, I load the tokenizer with load_multiprocess_safe, the dataset is just a bunch of plain text files to load and tokenize. I have tested each stage of loading and there are no problems until I wrap it in a DataLoader and use num_workers > 0, it hangs forever then.

alasdairforsythe / tokenmonster

Hangs with PyTorch data loaders when `num_workers > 0` #34