MilaNLProc / contextualized-topic-models

A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).
MIT License
1.21k stars 147 forks source link

GPU and CPU usage #132

Closed raymondsim closed 1 year ago

raymondsim commented 1 year ago

Description

I was trying to used a trained combined CTM and topic model data preparation saved using pickle as suggested. I have two GPUs and one CPU.

I checked that model and is loaded into GPU (in ctm.py). I also checked that X_bow and X_contextual are loaded into GPU as well (in get_doc_topic_distribution() in ctm.py)

However, I am getting the following error, saying that at least two devices were found. Is there any pointers to resolve this solutions? Is it because I have two GPUs?

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat1 in method wrapper_addmm)

Thanks in advance!

What I Did

def prepare_topic_model_data(documents, document_embedding):
    """

    This function prepares list of string for CTM
    Input: list of strings
    Output: transformed data for CTM

    """
    sp = WhiteSpacePreprocessingStopwords(documents, stopwords_language='english', vocabulary_size=len(vocab_set), vocab_set=vocab_set)

    preprocessed_documents, unpreprocessed_corpus, vocab, retained_indices = sp.preprocess()

    tp = pickle.load(open("/home/username/saved_topic_model/topic_data_preparation.p", "rb"))

    testing_dataset = tp.transform(text_for_contextual=unpreprocessed_corpus, text_for_bow=preprocessed_documents, custom_embeddings=document_embedding)

    return testing_dataset

# read in documents
documents = [line.strip() for line in open("/home/username/dataset/multi_news_raw" + "/train.tgt.txt", encoding="utf-8").readlines()[:]]

# read in pre-trained model vocab
with open(os.path.join("/home/username/model_name", "vocab.json"), 'r') as f:
    vocab_dict = json.load(f)

vocab_set = set(vocab_dict.keys())

# load in pre-computed document embedding
document_embedding = np.load("/home/username/document_embedding/multi_news_train.npy", allow_pickle=True)

documents = prepare_topic_model_data(documents[:10], document_embedding[:10])

ctm = CombinedTM(bow_size=len(vocab_set), contextual_size=1024, n_components=20, num_epochs=50)

ctm.load("/home/username/saved_topic_model/", 49)

source_dist = ctm.get_thetas(documents, n_samples=10) => error occurs here
vinid commented 1 year ago

can you share something for me to reproduce this error? e.g., a sample of the npy for example.

My current best guess would be that after

ctm.load("/home/username/saved_topic_model/", 49)

the weights are not on cuda and so you might need to manually do something like ctm.model.to("device")

raymondsim commented 1 year ago

Hi,

  1. Yes you're right, when I first initialized CTM, it's in GPU but after I loaded trained model, it moves to cpu. Is there any pointers how I can fix this?

Update: I trained CTM on CPU and load it in GPU, that's why it causes this problem. The way I solve it is changing "USE_CUDA" and "device" to true and "cuda" respectively in the for loop here

image
  1. Another problem I am struggling with is the workers problem. I keep getting this warning and when I run topic model along with BART in the same program, it causes CUDA issue. I tried to divide it by two (i.e. int(mp.cpu_count/2) ) but the warning still says I am using 80.

Update: This is caused by the same problem, trained and tested on different machine. The num_data_loader_workers was set to 80 because mp.cpu_count was 80 on my previous device. I modified code here such that if k == "num_data_loader_workers": v = mp.cpu_count()

image

Thanks a lot!