cmap / cmapPy

Assorted tools for interacting with .gct, .gctx files and other Connectivity Map (Broad Institute) data/tools
https://clue.io/cmapPy/index.html
BSD 3-Clause "New" or "Revised" License
124 stars 74 forks source link

ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size. #52

Closed FarshidShekari closed 5 years ago

FarshidShekari commented 5 years ago

When I want to read gctx file. Is fixable?

 File "C:/Users/Farshid/PycharmProjects/DGEP/gctx2npy.py", line 13, in main
    gctobj = parse.parse(GTEx_GCTX)
  File "C:\Users\Farshid\AppData\Local\Programs\Python\Python35-32\lib\site-packages\cmapPy\pandasGEXpress\parse.py", line 68, in parse
    make_multiindex=make_multiindex)
  File "C:\Users\Farshid\AppData\Local\Programs\Python\Python35-32\lib\site-packages\cmapPy\pandasGEXpress\parse_gctx.py", line 110, in parse
    data_df = parse_data_df(data_dset, sorted_ridx, sorted_cidx, row_meta, col_meta)
  File "C:\Users\Farshid\AppData\Local\Programs\Python\Python35-32\lib\site-packages\cmapPy\pandasGEXpress\parse_gctx.py", line 332, in parse_data_df
    data_array = np.empty(data_dset.shape, dtype=np.float32)
ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size.
Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.3.2\helpers\pydev\_pydevd_bundle\pydevd_comm.py", line 382, in _on_run
    r = self.sock.recv(1024)
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
oena commented 5 years ago

Hi @FarshidShekari, given your error message this seems like a connectivity or firewall issue, not an issue with cmapPy; if you google your error message, there seem to be a number of suggestions on how to approach this issue.

FarshidShekari commented 5 years ago

I set permission to Pycharm but it raises the error: ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size.

File "C:/Users/Farshid/PycharmProjects/DGEP/gctx2npy.py", line 13, in main
   gctobj = parse.parse(GTEx_GCTX)
 File "C:\Users\Farshid\AppData\Local\Programs\Python\Python35-32\lib\site-packages\cmapPy\pandasGEXpress\parse.py", line 68, in parse
   make_multiindex=make_multiindex)
 File "C:\Users\Farshid\AppData\Local\Programs\Python\Python35-32\lib\site-packages\cmapPy\pandasGEXpress\parse_gctx.py", line 110, in parse
   data_df = parse_data_df(data_dset, sorted_ridx, sorted_cidx, row_meta, col_meta)
 File "C:\Users\Farshid\AppData\Local\Programs\Python\Python35-32\lib\site-packages\cmapPy\pandasGEXpress\parse_gctx.py", line 332, in parse_data_df
   data_array = np.empty(data_dset.shape, dtype=np.float32)
ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size.
Traceback (most recent call last):
 File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.3.2\helpers\pydev\_pydevd_bundle\pydevd_comm.py", line 382, in _on_run
   r = self.sock.recv(1024)
oena commented 5 years ago

Hi Farshid, it looks like your file is too large to read into memory on whatever setup you're using. Probably a better approach would be to use hyperslab selection to only read in portions of the file; the cmapPy tutorial has an example of how to do this.