OOM - Githubissues

HarryVolek / PyTorch_Speaker_Verification

PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" by Wan, Li et al.

BSD 3-Clause "New" or "Revised" License

575 stars 166 forks source link

OOM #20

Closed LoveGalaxy closed 5 years ago

LoveGalaxy commented 5 years ago

Hi! I try to run this code in Linux on cpu with 16g memory. The cost of memory was growth fast util OS kill the process. It only run about 30 iterations. Do you have some idea to fix the problem? What should I do to debug? Thank you!

Aurora11111 commented 5 years ago

@guruL hello　，Ｉ　am run the project ues my own datasets! have you meet the issue below

LoveGalaxy commented 5 years ago

@guruL hello　，Ｉ　am run the project ues my own datasets! have you meet the issue below 中国人我就说中文了哈，方便表达一点。没遇到过，提示的是 utterance 的indice越界了，你看看 utterance 的维度对不对，对比一下作者用的默认TIMIT 数据输出的 utterance 的维度是多少

Aurora11111 commented 5 years ago

@guruL 我没有下载ＴＩＭＩＴ数据集，那个太大了．你用的ＴＩＭＩＴ数据集是这样的呢，可以截个图吗，我是用的aishell生成的npy,是这样的： test_tisv文件夹有几十个npy,train_tisv文件夹有３６０个npy的样子

HarryVolek commented 5 years ago

The config defaults to using your GPU. If you want to use your cpu, change the following line in the config.yaml:

device: "cuda"

If you still run out of memory, try reducing the batch size.

LoveGalaxy commented 5 years ago

@HarryVolek Thank you for your reply! I have changed this config to device: 'CPU' And I try to used only one sample in one batch, it also get a OOM error. Could you tell me what operating system do you use? My friend try to run this code on Win 7 do not meet this problem. And Could you tell me Which GPU you are used to train?

LoveGalaxy commented 5 years ago

@Aurora11111 不是这个意思老哥我指的是utterance这个数组你看下你的shape的维度是不是有三维，看报的错误估计和你的utterance维度有关系。npz个数和这个没关系。是npz里面的数据的维度的问题。然后我咨询一下，TIMIT这个库很大吗，我为啥只下了400多M，如果你找到的比这个多，能分享下你找到的连接吗，我现在需要再找点数据

Aurora11111 commented 5 years ago

@guruL 我下载下来是个种子文件．torrent格式的，确实是４００Ｍ，你是用的这个吗，怎么打开的，打开还是只有４００Ｍ吗．我的npz里的数据维度确实维度比较少 http://academictorrents.com/details/34e2b78745138186976cbc27939b1b34d18bd5b3

LoveGalaxy commented 5 years ago

@Aurora11111 TIMIT的话，按下面这个目录放到项目的根目录，前面几层的文件夹删掉 './TIMIT////.wav' 在README里有介绍你可以看下作者处理音频的代码，处理TIMIT的和处理你那个数据集的有啥不同

Aurora11111 commented 5 years ago

@guruL 恩，我用百度网盘打开了，但是我这个系统无法下载，我　的数据集是aishell,我还是想想办法怎么用这个ＴＩＭＩＴ吧

HarryVolek commented 5 years ago

@guruL I run Ubuntu. The model shouldn't be difficult to train, people have trained comparable models with CPUs. Either way, I believe the issue is with your hardware and not the code itself, so I am closing.

HarryVolek commented 5 years ago

When I wrote the code I was using a GTX 970 to train.

Aurora11111 commented 5 years ago

yeah ! after I change to TIMIT datasets it run succesfully. thank you! @guruL

LoveGalaxy commented 5 years ago

When I wrote the code I was using a GTX 970 to train.

Thank you very much! And thank you for opening the source code! It helps me a lot.

LoveGalaxy commented 5 years ago

yeah ! after I change to TIMIT datasets it run succesfully. thank you! @guruL

Welcome

trunglebka commented 5 years ago

I have trained on default config (exept with 'device: "cpu"') and encountered OOM error.

My dependencies:

PyTorch 1.0.1.post2
python 3.7 with anaconda
numpy 1.15.4
librosa 0.6.3

I don't know why it was happen. Sorry for my bad English.

HarryVolek commented 5 years ago

07f996a

Try again with the latest commit @trunglebka .

trunglebka commented 5 years ago

Sorry for reply lately I have updated your code to the lastest commit but OOV still there :(