GPU and CPU usage - Githubissues

raymondsim commented 1 year ago

Contextualized Topic Models version: 2.5.0
Python version: 3.7.0
Operating System: Ubuntu

Description

I was trying to used a trained combined CTM and topic model data preparation saved using pickle as suggested. I have two GPUs and one CPU.

I checked that model and is loaded into GPU (in ctm.py). I also checked that X_bow and X_contextual are loaded into GPU as well (in get_doc_topic_distribution() in ctm.py)

However, I am getting the following error, saying that at least two devices were found. Is there any pointers to resolve this solutions? Is it because I have two GPUs?

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat1 in method wrapper_addmm)

Thanks in advance!

What I Did

def prepare_topic_model_data(documents, document_embedding):
    """

    This function prepares list of string for CTM
    Input: list of strings
    Output: transformed data for CTM

    """
    sp = WhiteSpacePreprocessingStopwords(documents, stopwords_language='english', vocabulary_size=len(vocab_set), vocab_set=vocab_set)

    preprocessed_documents, unpreprocessed_corpus, vocab, retained_indices = sp.preprocess()

    tp = pickle.load(open("/home/username/saved_topic_model/topic_data_preparation.p", "rb"))

    testing_dataset = tp.transform(text_for_contextual=unpreprocessed_corpus, text_for_bow=preprocessed_documents, custom_embeddings=document_embedding)

    return testing_dataset

# read in documents
documents = [line.strip() for line in open("/home/username/dataset/multi_news_raw" + "/train.tgt.txt", encoding="utf-8").readlines()[:]]

# read in pre-trained model vocab
with open(os.path.join("/home/username/model_name", "vocab.json"), 'r') as f:
    vocab_dict = json.load(f)

vocab_set = set(vocab_dict.keys())

# load in pre-computed document embedding
document_embedding = np.load("/home/username/document_embedding/multi_news_train.npy", allow_pickle=True)

documents = prepare_topic_model_data(documents[:10], document_embedding[:10])

ctm = CombinedTM(bow_size=len(vocab_set), contextual_size=1024, n_components=20, num_epochs=50)

ctm.load("/home/username/saved_topic_model/", 49)

source_dist = ctm.get_thetas(documents, n_samples=10) => error occurs here

vinid commented 1 year ago

can you share something for me to reproduce this error? e.g., a sample of the npy for example.

My current best guess would be that after

ctm.load("/home/username/saved_topic_model/", 49)

the weights are not on cuda and so you might need to manually do something like ctm.model.to("device")

raymondsim commented 1 year ago

Hi,

Yes you're right, when I first initialized CTM, it's in GPU but after I loaded trained model, it moves to cpu. Is there any pointers how I can fix this?

Update: I trained CTM on CPU and load it in GPU, that's why it causes this problem. The way I solve it is changing "USE_CUDA" and "device" to true and "cuda" respectively in the for loop here

Another problem I am struggling with is the workers problem. I keep getting this warning and when I run topic model along with BART in the same program, it causes CUDA issue. I tried to divide it by two (i.e. int(mp.cpu_count/2) ) but the warning still says I am using 80.

Update: This is caused by the same problem, trained and tested on different machine. The num_data_loader_workers was set to 80 because mp.cpu_count was 80 on my previous device. I modified code here such that if k == "num_data_loader_workers": v = mp.cpu_count()

Thanks a lot!

MilaNLProc / contextualized-topic-models

GPU and CPU usage #132

Description

What I Did