TaoRuijie / Loss-Gated-Learning

ICASSP 2022: 'Self-supervised Speaker Recognition with Loss-gated Learning'
MIT License
85 stars 15 forks source link

CUDA out of memory #9

Closed peggyxpxu closed 8 months ago

peggyxpxu commented 1 year ago

When I train Stage2, GPU is 11G, batchsize is 1. But it keeps showing 'CUDA out of memory'

TaoRuijie commented 1 year ago

Reduce here instead of bs:

https://github.com/TaoRuijie/Loss-Gated-Learning/blob/main/Stage2/dataLoader.py#L119

peggyxpxu commented 1 year ago

Reduce here instead of bs:

https://github.com/TaoRuijie/Loss-Gated-Learning/blob/main/Stage2/dataLoader.py#L119

tks, but I find another error when training Stage2: Traceback (most recent call last): File "main_train.py", line 59, in dic_label, NMI = Trainer.cluster_network(loader = clusterLoader, n_cluster = args.n_cluster) # Do clustering File "/mnt/data3/jhuser2/code/Loss-Gated-Learning-main/Stage2/model.py", line 60, in cluster_network clus.train(out_all, index) # Clustering File "/mnt/data1/xxp/miniconda3/envs/Loss-Gated-Learning/lib/python3.8/site-packages/faiss/class_wrappers.py", line 85, in replacement_train self.train_c(n, swig_ptr(x), index) File "/mnt/data1/xxp/miniconda3/envs/Loss-Gated-Learning/lib/python3.8/site-packages/faiss/swigfaiss_avx2.py", line 2560, in train return _swigfaiss_avx2.Clustering_train(self, n, x, index, x_weights) RuntimeError: Error in void faiss::Clustering::train_encoded(faiss::Clustering::idx_t, const uint8_t, const faiss::Index, faiss::Index&, const float*) at /root/miniconda3/conda-bld/faiss-pkg_1669821803039/work/faiss/Clustering.cpp:277: Error: 'nx >= k' failed: Number of training points (5130) should be at least as large as number of clusters (6000)

TaoRuijie commented 11 months ago

Sorry I am not sure why you have the number '5130' here...also sorry for missing your information