MemoryError: Unable to allocate 168. GiB for an array with shape (76821, 542, 542) and data type float64

CRIPAC-DIG / TextING

[ACL 2020] Tensorflow implementation for "Every Document Owns Its Structure: Inductive Text Classification via Graph Neural Networks"

180 stars 57 forks source link

MemoryError: Unable to allocate 168. GiB for an array with shape (76821, 542, 542) and data type float64 #17

Open Al-Dailami opened 3 years ago

Al-Dailami commented 3 years ago

loading training set 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 76821/76821 [02:21<00:00, 543.88it/s] Traceback (most recent call last): File "train.py", line 39, in train_adj, train_mask = preprocess_adj(train_adj) File "~/TextING/utils.py", line 153, in preprocess_adj return np.array(list(adj)), mask # coo_to_tuple(sparse.COO(np.array(list(adj)))), mask

Al-Dailami commented 3 years ago

Hello

Can you help me fix this problem!!!

Magicat128 commented 3 years ago

Hi @Al-Dailami

Which dataset are you using? You may try processing training samples in batches and concatenate them with NumPy.

Al-Dailami commented 3 years ago

Thanks a lot for your reply.

I'm working in a dataset that contains around 500,000 record of short texts. Can you please help me on how to modify the code to be able to process the data in batches.

Thanks a lot in advance for your valuable help.

Al-Dailami commented 3 years ago

Hello, I have modified the trainer to process data batch by batch.. Is this a right way? https://github.com/CRIPAC-DIG/TextING/blob/c2492c276a6b59ca88337e582dfd2f3616f3988d/train.py#L124

b_train_adj, b_train_mask = preprocess_adj(train_adj[idx]) b_train_feature = preprocess_features(train_feature[idx]) feed_dict = construct_feed_dict(b_train_feature, b_train_adj, b_train_mask, train_y[idx], placeholders) feed_dict.update({placeholders['dropout']: FLAGS.dropout})

Magicat128 commented 3 years ago

Hello, I have modified the trainer to process data batch by batch.. Is this a right way? https://github.com/CRIPAC-DIG/TextING/blob/c2492c276a6b59ca88337e582dfd2f3616f3988d/train.py#L124

b_train_adj, b_train_mask = preprocess_adj(train_adj[idx]) b_train_feature = preprocess_features(train_feature[idx]) feed_dict = construct_feed_dict(b_train_feature, b_train_adj, train_mask, train_y[idx], placeholders) feed_dict.update({placeholders['dropout']: FLAGS.dropout})

@Al-Dailami Yes, you can do it. And it's your b_train_mask in feed_dict rather than train_mask :)

bp20200202 commented 3 years ago

Hello, I have modified the trainer to process data batch by batch.. Is this a right way? https://github.com/CRIPAC-DIG/TextING/blob/c2492c276a6b59ca88337e582dfd2f3616f3988d/train.py#L124

b_train_adj, b_train_mask = preprocess_adj(train_adj[idx]) b_train_feature = preprocess_features(train_feature[idx]) feed_dict = construct_feed_dict(b_train_feature, b_train_adj, b_train_mask, train_y[idx], placeholders) feed_dict.update({placeholders['dropout']: FLAGS.dropout})

Hello, I would like to ask if I am still reporting memoryerror after changing the code given by you, have you ever experienced this situation?