MilaNLProc / contextualized-topic-models

A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).
MIT License
1.2k stars 145 forks source link

BrokenPipeError: [Errno 32] Broken pipe Multithread problem on Windows #107

Closed TianlinZhang668 closed 2 years ago

TianlinZhang668 commented 2 years ago

Description

When I use windows+pytorch+pycharm.

I meet ForkingPickler(file, protocol).dump(obj) BrokenPipeError: [Errno 32] Broken pipe

It seems to be a Multithread problem. (from other web seems to modify num_workers to 0) But I dont know how to fix it in CTM.

vinid commented 2 years ago

Hi!

can you tell me which line generates the error?

You should be able to set

num_data_loader_workers=0

in the initialization of the CTM object (either ZeroShotTM or CombinedTM).

See it here

TianlinZhang668 commented 2 years ago

Thank you very much for your help! And I have another question. Does the model apply unsupervised dataset (I have 500,000 sentences)? what other operations do I need to do when using CTM or Kitty?

vinid commented 2 years ago

You can follow one of the tutorials to get a better idea of how to run everything. You should be able to run the tutorials on your data by changing a few lines of code.

Important thing: since your dataset is very big, you definitely need to have a GPU to run this.