Santosh-Gupta / SpeedTorch

Library for faster pinned CPU <-> GPU transfer in Pytorch
MIT License
682 stars 39 forks source link

gadgetCPU.gadgetInit() report an error! #7

Open xscjun opened 5 years ago

xscjun commented 5 years ago

When I run the code: gadgetCPU = SpeedTorch.DataGadget( 'data.npy',CPUPinn=True) gadgetCPU.gadgetInit()

report an error like this: Exception ignored in: <function PMemory.del at 0x7fcef8ca86a8> Traceback (most recent call last): File "/usr/local/python3/lib/python3.7/site-packages/SpeedTorch/CUPYLive.py", line 19, in del AttributeError: 'NoneType' object has no attribute 'runtime'

But it's ok when I run : gadgetGPU = SpeedTorch.DataGadget( 'data.npy' ) gadgetGPU.gadgetInit()

I can't find the reason,it confused me .

Santosh-Gupta commented 5 years ago

Hi xscjun,

That is very confusing. I am investigating by trying to recreate the issue. So far I am unable to re-create the issue in Colab.

Here's the code I used

!pip install SpeedTorch
#Always import cupy before SpeedTorch 
import cupy
import SpeedTorch
import torch
import numpy as np
import torch.nn as nn

sampl = np.random.uniform(low=-1.0, high=1.0, size=(10, 10, 10, 10))
np.save('data.npy', sampl)
del sampl

gadgetGPU = SpeedTorch.DataGadget( 'data.npy', CPUPinn=True )
gadgetGPU.gadgetInit()

For convenience, here's the colab notebook I used.

https://colab.research.google.com/drive/1TbqKwZ94p_B6q0t_orYObKsWwa7Fg0ld

Are you able to recreate the issue in Colab? If so, link a notebook for further investigation.

If not, it looks like it's an issue with your system. In that case, you could provide your system information and see if we can figure out what is causing the error from that info.

xscjun commented 5 years ago

Hi xscjun,

That is very confusing. I am investigating by trying to recreate the issue. So far I am unable to re-create the issue in Colab.

Here's the code I used

!pip install SpeedTorch
#Always import cupy before SpeedTorch 
import cupy
import SpeedTorch
import torch
import numpy as np
import torch.nn as nn

sampl = np.random.uniform(low=-1.0, high=1.0, size=(10, 10, 10, 10))
np.save('data.npy', sampl)
del sampl

gadgetGPU = SpeedTorch.DataGadget( 'data.npy', CPUPinn=True )
gadgetGPU.gadgetInit()

For convenience, here's the colab notebook I used.

https://colab.research.google.com/drive/1TbqKwZ94p_B6q0t_orYObKsWwa7Fg0ld

Are you able to recreate the issue in Colab? If so, link a notebook for further investigation.

If not, it looks like it's an issue with your system. In that case, you could provide your system information and see if we can figure out what is causing the error from that info.

Thanks for your reply,It's the python3.7 that report the error, I change the version of python to 2.7, It's ok now.

Santosh-Gupta commented 5 years ago

Glad you were able to get it working. I am wondering what the cause is; I did all my testing in Python 3.

xscjun commented 5 years ago

Glad you were able to get it working. I am wondering what the cause is; I did all my testing in Python 3.

I can't get it working in python3.7 ,it is confusing.

Santosh-Gupta commented 5 years ago

I noticed Colab has Python 3.6.8 by default, so perhaps there is something off about 3.7.

Are you able to recreate the issue in Colab? Is the 'data.npy' the same as in the notebook?

xscjun commented 5 years ago

I noticed Colab has Python 3.6.8 by default, so perhaps there is something off about 3.7.

Are you able to recreate the issue in Colab? Is the 'data.npy' the same as in the notebook?

The 'data.npy' is the same as in the notebook. I haven't recreate the issue in Colab

Approximetal commented 4 years ago

I got OOM error, how can I deel with it? BTY, how can I load multiple data in one container? gadgetGPU = SpeedTorch.DataGadget(target_mel) gadgetGPU.gadgetInit() Traceback (most recent call last): File "/home/zzy/anaconda3/envs/StarGAN-VC/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3326, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-19-226edd99569f>", line 1, in <module> gadgetGPU.gadgetInit() File "/home/zzy/anaconda3/envs/StarGAN-VC/lib/python3.6/site-packages/SpeedTorch/CUPYLive.py", line 265, in gadgetInit self.CUPYcorpus = cupy.load( self.fileName) File "/home/zzy/anaconda3/envs/StarGAN-VC/lib/python3.6/site-packages/cupy/io/npz.py", line 71, in load return cupy.array(obj) File "/home/zzy/anaconda3/envs/StarGAN-VC/lib/python3.6/site-packages/cupy/creation/from_data.py", line 43, in array return core.array(obj, dtype, copy, order, subok, ndmin) File "cupy/core/core.pyx", line 1768, in cupy.core.core.array File "cupy/core/core.pyx", line 1845, in cupy.core.core.array File "cupy/core/core.pyx", line 1920, in cupy.core.core._send_object_to_gpu File "cupy/core/core.pyx", line 134, in cupy.core.core.ndarray.__init__ File "cupy/cuda/memory.pyx", line 540, in cupy.cuda.memory.alloc File "cupy/cuda/memory.pyx", line 1234, in cupy.cuda.memory.MemoryPool.malloc File "cupy/cuda/memory.pyx", line 1255, in cupy.cuda.memory.MemoryPool.malloc File "cupy/cuda/memory.pyx", line 1033, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc File "cupy/cuda/memory.pyx", line 1053, in cupy.cuda.memory.SingleDeviceMemoryPool._malloc File "cupy/cuda/memory.pyx", line 775, in cupy.cuda.memory._try_malloc cupy.cuda.memory.OutOfMemoryError: Out of memory allocating 86,528 bytes (allocated so far: 0 bytes).

Santosh-Gupta commented 4 years ago

I got OOM error, how can I deel with it? BTY, how can I load multiple data in one container? gadgetGPU = SpeedTorch.DataGadget(target_mel) gadgetGPU.gadgetInit() Traceback (most recent call last): File "/home/zzy/anaconda3/envs/StarGAN-VC/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3326, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-19-226edd99569f>", line 1, in <module> gadgetGPU.gadgetInit() File "/home/zzy/anaconda3/envs/StarGAN-VC/lib/python3.6/site-packages/SpeedTorch/CUPYLive.py", line 265, in gadgetInit self.CUPYcorpus = cupy.load( self.fileName) File "/home/zzy/anaconda3/envs/StarGAN-VC/lib/python3.6/site-packages/cupy/io/npz.py", line 71, in load return cupy.array(obj) File "/home/zzy/anaconda3/envs/StarGAN-VC/lib/python3.6/site-packages/cupy/creation/from_data.py", line 43, in array return core.array(obj, dtype, copy, order, subok, ndmin) File "cupy/core/core.pyx", line 1768, in cupy.core.core.array File "cupy/core/core.pyx", line 1845, in cupy.core.core.array File "cupy/core/core.pyx", line 1920, in cupy.core.core._send_object_to_gpu File "cupy/core/core.pyx", line 134, in cupy.core.core.ndarray.__init__ File "cupy/cuda/memory.pyx", line 540, in cupy.cuda.memory.alloc File "cupy/cuda/memory.pyx", line 1234, in cupy.cuda.memory.MemoryPool.malloc File "cupy/cuda/memory.pyx", line 1255, in cupy.cuda.memory.MemoryPool.malloc File "cupy/cuda/memory.pyx", line 1033, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc File "cupy/cuda/memory.pyx", line 1053, in cupy.cuda.memory.SingleDeviceMemoryPool._malloc File "cupy/cuda/memory.pyx", line 775, in cupy.cuda.memory._try_malloc cupy.cuda.memory.OutOfMemoryError: Out of memory allocating 86,528 bytes (allocated so far: 0 bytes).

Can you make a colab that reproduces this error? That way I can interact with the bug.

BTY, how can I load multiple data in one container?

I haven't put that feature in, but I can put it in. So you would want ModelFactoryObject.loadCupy( loadFileName) for the first dataset, and then for new datasets. something like ModelFactoryObject.appendCupy( loadFileName2)

?

Approximetal commented 4 years ago

import cupy import SpeedTorch gadgetGPU = SpeedTorch.DataGadget('mel-20170001P00084I0004.npy') gadgetGPU.gadgetInit() mel-20170001P00084I0004.zip It seems gadgetGPU.gadgetInit()will result this error no matter which data I load.

Santosh-Gupta commented 4 years ago

It looks like the data format is incorrect. It looks like there's an issue with how you're saving the data, and/or how your zipping the file.

Checkout this notebook, which saves and loads numpy data

https://colab.research.google.com/drive/185Z5Gi62AZxh-EeMfrTtjqxEifHOBXxF

Try using numpy.save to directly save your data into a numpy format file.

Approximetal commented 4 years ago

It looks like the data format is incorrect. It looks like there's an issue with how you're saving the data, and/or how your zipping the file.

Checkout this notebook, which saves and loads numpy data

https://colab.research.google.com/drive/185Z5Gi62AZxh-EeMfrTtjqxEifHOBXxF

Try using numpy.save to directly save your data into a numpy format file.

It seems hard to change the data formate as my model has trained for a long time... Is there any method I can transfer data from CPU to your container? Or is there any method can replace torch.utils.data.Dataloader in pytorch? For example I've already preprocessed my data and saved it in a list.

Santosh-Gupta commented 4 years ago

Yup, I forgot the exact commands, but you can access your embedding data and mount them to CPU, in numpy form. It looks something like this YourModel.YourEmbeddingVariable.Weight.data.cpu().numpy()

details

https://discuss.pytorch.org/t/how-to-transform-variable-into-numpy/104/5

Approximetal commented 4 years ago

Yup, I forgot the exact commands, but you can access your embedding data and mount them to CPU, in numpy form. It looks something like this YourModel.YourEmbeddingVariable.Weight.data.cpu().numpy()

details

https://discuss.pytorch.org/t/how-to-transform-variable-into-numpy/104/5

Can't open the link... I don't mean the embedding data like weight or parameters in model, I mean the training data, a set of data loaded in CPU, the time cost usually waste on loading batch from CPU to GPU. So I was wondering if I could save training data in speedtorch. It will helpful if there is a document to explain those functions in speedtorch.

Approximetal commented 4 years ago

Thank you for replying! I can load files by using speedtorch now. But cupy doesn't support multi-thread, so I have to modify the thread from 8 to 1, after that, the time cost is even longer......

Santosh-Gupta commented 4 years ago

Yes, as long as the data is saved in numpy format, data gadget can open it, or your could transfer live data onto there. If you give me a colab notebook which loads your data, I can tinker around with it. I think the easiest way to do this to upload your data onto google drive, then use !gdown --id followed by the google drive id to download it directly to your notebook.

There's documentation at the bottom of the readme, and here's a colab notebook which shows how to use the data gadget:

https://colab.research.google.com/drive/1TbqKwZ94p_B6q0t_orYObKsWwa7Fg0ld

Santosh-Gupta commented 4 years ago

Thank you for replying! I can load files by using speedtorch now. But cupy doesn't support multi-thread, so I have to modify the thread from 8 to 1, after that, the time cost is even longer......

How many cores is your CPU? The main speedtorch advantages are for a lower number of CPUs like, 1-4. After that, Pytorches indexing kernals become more efficient.

I would love to see a colab version of your code, maybe i can tinker a bit

Approximetal commented 4 years ago

Thank you for replying! I can load files by using speedtorch now. But cupy doesn't support multi-thread, so I have to modify the thread from 8 to 1, after that, the time cost is even longer......

How many cores is your CPU? The main speedtorch advantages are for a lower number of CPUs like, 1-4. After that, Pytorches indexing kernals become more efficient.

I would love to see a colab version of your code, maybe i can tinker a bit

My CPU info: 8 Intel(R) Xeon(R) W-2123 CPU @ 3.60GHz The model I use is based on https://github.com/NVIDIA/tacotron2 And I replaced this line by using

melspec = SpeedTorch.DataGadget(full_path) melspec.gadgetInit() melspec = melspec.getData()

(BTW, I don't know how to get full Data, so I modified getData() and removed the parameter index)

Santosh-Gupta commented 4 years ago

How many cores does that CPU have? I can't seem to look it up.

I'm not too familiar with that model. But with a colab notebook perhaps I can tinker around.

Approximetal commented 4 years ago

How many cores does that CPU have? I can't seem to look it up.

I'm not too familiar with that model. But with a colab notebook perhaps I can tinker around.

4 cores. I'm afraid I can't upload the model on colab, every time I open that link, my computer is about to freeze......