UnboundLocalError: local variable 'intermediate' referenced before assignment

s-celles commented 9 years ago

Hi Yuri,

I created a big CSV file to try what @firecat53 noticed about very long time to open file and scroll data (see https://github.com/firecat53/tabview/issues/127 )

import pandas as pd
import numpy as np
(rows, cols) = (4000000, 10)
a = np.random.random((rows, cols))
df = pd.DataFrame(a)
filename = "big_random.csv"
df.to_csv(filename, index=False)

It should create a 770 Mb file !

Then I tried to open this file using gtabview

$ gtabview file://big_random.csv
Traceback (most recent call last):
  File "//anaconda/bin/gtabview", line 4, in <module>
    __import__('pkg_resources').run_script('gtabview==0.3', 'gtabview')
  File "//anaconda/lib/python3.4/site-packages/setuptools-18.0.1-py3.4.egg/pkg_resources/__init__.py", line 735, in run_script
  File "//anaconda/lib/python3.4/site-packages/setuptools-18.0.1-py3.4.egg/pkg_resources/__init__.py", line 1652, in run_script
  File "/anaconda/lib/python3.4/site-packages/gtabview-0.3-py3.4.egg/EGG-INFO/scripts/gtabview", line 94, in <module>
    transpose=args.transpose, metavar=args.filename)
  File "//anaconda/lib/python3.4/site-packages/gtabview-0.3-py3.4.egg/gtabview/__init__.py", line 144, in view
    model = as_model(data, hdr_rows=hdr_rows, idx_cols=idx_cols, transpose=transpose)
  File "//anaconda/lib/python3.4/site-packages/gtabview-0.3-py3.4.egg/gtabview/models.py", line 237, in as_model
    model = ExtBlazeModel(data)
  File "//anaconda/lib/python3.4/site-packages/gtabview-0.3-py3.4.egg/gtabview/models.py", line 147, in __init__
    self._shape = (int(data.nrows), len(data.fields))
  File "//anaconda/lib/python3.4/site-packages/blaze-0.8.2+66.g1a8acda-py3.4.egg/blaze/interactive.py", line 343, in <lambda>
    Expr.__int__ = lambda x: convert_base(int, x)
  File "//anaconda/lib/python3.4/site-packages/blaze-0.8.2+66.g1a8acda-py3.4.egg/blaze/interactive.py", line 336, in convert_base
    x = compute(x)
  File "//anaconda/lib/python3.4/site-packages/multipledispatch/dispatcher.py", line 164, in __call__
    return func(*args, **kwargs)
  File "//anaconda/lib/python3.4/site-packages/blaze-0.8.2+66.g1a8acda-py3.4.egg/blaze/interactive.py", line 172, in compute
    return compute(expr, resources, **kwargs)
  File "//anaconda/lib/python3.4/site-packages/multipledispatch/dispatcher.py", line 164, in __call__
    return func(*args, **kwargs)
  File "//anaconda/lib/python3.4/site-packages/blaze-0.8.2+66.g1a8acda-py3.4.egg/blaze/compute/core.py", line 471, in compute
    result = top_then_bottom_then_top_again_etc(expr3, d4, **kwargs)
  File "//anaconda/lib/python3.4/site-packages/blaze-0.8.2+66.g1a8acda-py3.4.egg/blaze/compute/core.py", line 159, in top_then_bottom_then_top_again_etc
    return compute_down(expr, *leaf_data, **kwargs)
  File "//anaconda/lib/python3.4/site-packages/multipledispatch/dispatcher.py", line 164, in __call__
    return func(*args, **kwargs)
  File "//anaconda/lib/python3.4/site-packages/blaze-0.8.2+66.g1a8acda-py3.4.egg/blaze/compute/chunks.py", line 55, in compute_down
    return compute(agg_expr, {agg: intermediate})
UnboundLocalError: local variable 'intermediate' referenced before assignment

Not sure if that's a gtabview issue or a Blaze issue.

Pinging also @cpcloud and @llllllllll

Kind regards

wavexx commented 9 years ago

Looks like an issue into blaze.

By the way @scls19fr, I've also run into this: https://github.com/ContinuumIO/blaze/issues/1191 but didn't have time to debug it yet.

Do you have anything similar happening?

wavexx commented 9 years ago

It generates a 573MB file for me, but I could open it using python 2.7 and blaze 0.8.2 (from pypy). It initially opens quicker than gtabview big_random.csv, but then it's amazingly slow. Unusable actually.

s-celles commented 9 years ago

That's odd that you don't get same file size

pc:~ scls$ cat big.py
import pandas as pd
import numpy as np
(rows, cols) = (4000000, 10)
a = np.random.random((rows, cols))
df = pd.DataFrame(a)
filename = "big_random.csv"
df.to_csv(filename, index=False)
pc:~ scls$ python big.py
pc:~ scls$ ls -lh big_random.csv
-rw-r--r--  1 scls  staff   735M  6 aoû 13:37 big_random.csv
pc:~ scls$ head big_random.csv
0,1,2,3,4,5,6,7,8,9
0.21274487359371408,0.9837956452361427,0.15916720143813157,0.2681755865886024,0.7204710887278248,0.3086394030805869,0.30062424542067534,0.6960646137570186,0.34166666946589364,0.27586304644665727
0.8777946375156523,0.6033697123338008,0.1327706266615769,0.19529643231130522,0.27477054259777434,0.4468524316143998,0.940254670593807,0.18968403819945623,0.2738538547944517,0.12564400838744338
0.2018934919749089,0.07524548034574063,0.6473819252708584,0.6071002176130551,0.40511265167956656,0.2791859033387186,0.7154128345443975,0.4866797736287697,0.4584847407677841,0.3798229634416679
0.9011497780314796,0.5777840362131448,0.3499451294403626,0.4070743759854154,0.7087747090990143,0.34894823904330574,0.33488167867742125,0.39637388267588536,0.40657046018943,0.1805436010295245
0.19026708133181092,0.5247328762094844,0.021502947916826387,0.7580506570759334,0.5779723788378057,0.6127493575936307,0.8011351193298644,0.6636015321535718,0.4607859110565661,0.08490276375289674
0.7143217456084715,0.011198040471145032,0.8892333967777504,0.6768191157336442,0.42295595169840083,0.8769479341732865,0.9891525199717826,0.9647959264864102,0.3240608535624976,0.210874737113377
0.21672596123550258,0.3482696140148287,0.7101869395685214,0.6474932686786607,0.16354057335375938,0.3052394529802829,0.7360537292259517,0.3575203114582275,0.9447179623804465,0.03532260562656109
0.6407757887342225,0.06897946464244908,0.4520628499915391,0.22465134543324095,0.7808744507260172,0.005931638090803992,0.8193511179065976,0.5469973751275239,0.4012570157732708,0.9510566112687189
0.43224384198381016,0.681428966272423,0.10416321326939937,0.2879100716695391,0.8998485262708976,0.4314634776128088,0.0885892489077732,0.11030100124975784,0.6841513022708292,0.6409559413160515

wavexx commented 9 years ago

Looks like your decimals are 4 digits longer than mine ;)

s-celles commented 9 years ago

I noticed a 6 digits difference between Python 3 and Python 2. https://github.com/pydata/pandas/issues/10777

To be honest, I can live with this issue (and with the issue of long time to open big CSV file) for now.

I'd prefer to have a tabview version which could also handle Blaze (and share most of gtabview code) before mid-September.

llllllllll commented 9 years ago

can you get a mwe for the unbound locals error and open an issue on blaze? I will try to look into it.

s-celles commented 9 years ago

I thought that

$ gtabview file://big_random.csv

was an enough minimal (not) working example ;-)

I did

import blaze
dat = blaze.Data("big_random.csv")
chunk_size = 16384
cols = dat.columns
list(dat[cols][0:chunk_size])

but I wasn't able to reproduce it.

wavexx commented 3 years ago

Closing this, since there's not much we can do about this here.

TabViewer / gtabview

UnboundLocalError: local variable 'intermediate' referenced before assignment #18