Closed s-celles closed 3 years ago
Looks like an issue into blaze.
By the way @scls19fr, I've also run into this: https://github.com/ContinuumIO/blaze/issues/1191 but didn't have time to debug it yet.
Do you have anything similar happening?
It generates a 573MB file for me, but I could open it using python 2.7 and blaze 0.8.2 (from pypy).
It initially opens quicker than gtabview big_random.csv
, but then it's amazingly slow. Unusable actually.
That's odd that you don't get same file size
pc:~ scls$ cat big.py
import pandas as pd
import numpy as np
(rows, cols) = (4000000, 10)
a = np.random.random((rows, cols))
df = pd.DataFrame(a)
filename = "big_random.csv"
df.to_csv(filename, index=False)
pc:~ scls$ python big.py
pc:~ scls$ ls -lh big_random.csv
-rw-r--r-- 1 scls staff 735M 6 aoû 13:37 big_random.csv
pc:~ scls$ head big_random.csv
0,1,2,3,4,5,6,7,8,9
0.21274487359371408,0.9837956452361427,0.15916720143813157,0.2681755865886024,0.7204710887278248,0.3086394030805869,0.30062424542067534,0.6960646137570186,0.34166666946589364,0.27586304644665727
0.8777946375156523,0.6033697123338008,0.1327706266615769,0.19529643231130522,0.27477054259777434,0.4468524316143998,0.940254670593807,0.18968403819945623,0.2738538547944517,0.12564400838744338
0.2018934919749089,0.07524548034574063,0.6473819252708584,0.6071002176130551,0.40511265167956656,0.2791859033387186,0.7154128345443975,0.4866797736287697,0.4584847407677841,0.3798229634416679
0.9011497780314796,0.5777840362131448,0.3499451294403626,0.4070743759854154,0.7087747090990143,0.34894823904330574,0.33488167867742125,0.39637388267588536,0.40657046018943,0.1805436010295245
0.19026708133181092,0.5247328762094844,0.021502947916826387,0.7580506570759334,0.5779723788378057,0.6127493575936307,0.8011351193298644,0.6636015321535718,0.4607859110565661,0.08490276375289674
0.7143217456084715,0.011198040471145032,0.8892333967777504,0.6768191157336442,0.42295595169840083,0.8769479341732865,0.9891525199717826,0.9647959264864102,0.3240608535624976,0.210874737113377
0.21672596123550258,0.3482696140148287,0.7101869395685214,0.6474932686786607,0.16354057335375938,0.3052394529802829,0.7360537292259517,0.3575203114582275,0.9447179623804465,0.03532260562656109
0.6407757887342225,0.06897946464244908,0.4520628499915391,0.22465134543324095,0.7808744507260172,0.005931638090803992,0.8193511179065976,0.5469973751275239,0.4012570157732708,0.9510566112687189
0.43224384198381016,0.681428966272423,0.10416321326939937,0.2879100716695391,0.8998485262708976,0.4314634776128088,0.0885892489077732,0.11030100124975784,0.6841513022708292,0.6409559413160515
Looks like your decimals are 4 digits longer than mine ;)
I noticed a 6 digits difference between Python 3 and Python 2. https://github.com/pydata/pandas/issues/10777
To be honest, I can live with this issue (and with the issue of long time to open big CSV file) for now.
I'd prefer to have a tabview
version which could also handle Blaze (and share most of gtabview code) before mid-September.
can you get a mwe for the unbound locals error and open an issue on blaze? I will try to look into it.
I thought that
$ gtabview file://big_random.csv
was an enough minimal (not) working example ;-)
I did
import blaze
dat = blaze.Data("big_random.csv")
chunk_size = 16384
cols = dat.columns
list(dat[cols][0:chunk_size])
but I wasn't able to reproduce it.
Closing this, since there's not much we can do about this here.
Hi Yuri,
I created a big CSV file to try what @firecat53 noticed about very long time to open file and scroll data (see https://github.com/firecat53/tabview/issues/127 )
It should create a 770 Mb file !
Then I tried to open this file using
gtabview
Not sure if that's a gtabview issue or a Blaze issue.
Pinging also @cpcloud and @llllllllll
Kind regards