Open s-celles opened 9 years ago
It's a nice idea, but I'd like to see more options. Are you aware of other spreadsheet-looking things done in python? (as long as it's not tk - I'm fine with qt, wx or gtk in order of preference).
The thing is, the table controller in QT is extremely slow. As you will see, it can barely handle 1m cells even on a decent i5. There are means to implement the data/view controller itself, but then there's a lot of work to do. You also cannot reasonably have more than a single header row/column (which is a pity, since DF would look so nice) without also re-implementing a lot of stuff.
In that sense, I'd like to take as much as what's already available.
Sorry I don't know other spreadsheet looking Python project.
I try this in Spyder (IPython).
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.random((5000000,3)))
and go to Spyder variables editor
Scroll down... you will notice that table is only feed with (around) 500 rows
Going to the bottom of DataFrame is really difficult.
Meanwhile I discovered pandas had something like this already:
It has been split into a separate module:
https://github.com/datalyze-solutions/pandas-qt
Ironically, it handles multi-level indexes worse than what gtabview is already doing, and seems to be equally slow :(
Or maybe not. I now implemented some simple models, I can display millions of rows without problems.
That's very efficient! It can display millions of rows but also millions of columns! You did a great work Thanks
On 21/07/15 15:47, scls19fr wrote:
That's very efficient! It can display millions of rows but also millions of columns! You did a great work
Now it just needs a few keybindings to make it similar to navigate to tabview.
Yes with keybindings it will be more convenient.
Keeping window size will be probably an other interesting improvement. Several size policies could exists:
view(...)
always displays a window which have a defined default sizeview(...)
with same object (might use id(obj)
) displays a window which have last sizeOn 21/07/15 15:58, scls19fr wrote:
Yes with keybindings it will be more convenient.
Keeping window size will be probably an other interesting improvement. Several size policies could exists:
- calling |view(...)| always displays a window which have a defined default size
- calling |view(...)| with same object (might use |id(obj)|) displays a window which have last size
This is already done. You have two more parameters in gtabview.view: modal and recycle.
By default, modal=True and it will block the python process as tabview.view does. However, this being a gui, with modal=False the window will be asyncronous (you can keep working). calling view(modal=False) will reuse the same window (recycle=True by default).
Thanks for these tips.
I wasn't aware of this non-modal mode.
But you should notice that:
df.loc[0, 0] = 1.23
), window shows these changes (and that's a good thing)df["new_col"] = 1
), window won't update (unfortunately) you need to call view(df, modal=False)Calling (again) view(df, modal=False)
should also put the window on top
But I'm sorry I think I wasn't clear enouth if you close window (modal or not) you don't keep window size.
Try now. I changed the defaults/names to wait=False, recycle=True.
I think you changed something else.
With Python 3
In [18]: from gtabview import view
File "//anaconda/lib/python3.4/site-packages/gtabview/__init__.py", line 132
app.postEvent(self._view, QtCore.QEvent(QtCore.QEvent.None))
^
SyntaxError: invalid syntax
With Python 2
pc:~ scls$ source activate py2
discarding //anaconda/bin from PATH
prepending //anaconda/envs/py2/bin to PATH
(py2)pc:~ scls$ pip uninstall gtabview
Cannot uninstall requirement gtabview, not installed
(py2)pc:~ scls$ pip install git+git://github.com/wavexx/gtabview
Collecting git+git://github.com/wavexx/gtabview
Cloning git://github.com/wavexx/gtabview to /var/folders/j_/v8b1bst93_94t724ptsswfsr0000gn/T/pip-J7fZM7-build
Requirement already satisfied (use --upgrade to upgrade): setuptools in /anaconda/envs/py2/lib/python2.7/site-packages/setuptools-18.0.1-py2.7.egg (from gtabview==0.1)
Installing collected packages: gtabview
Running setup.py install for gtabview
Successfully installed gtabview-0.1
(py2)pc:~ scls$ ipython
Python 2.7.10 |Anaconda 2.3.0 (x86_64)| (default, May 28 2015, 17:04:42)
Type "copyright", "credits" or "license" for more information.
IPython 3.2.1 -- An enhanced Interactive Python.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: from gtabview import view
In [2]:
In [2]: import pandas as pd
In [3]: df = pd.DataFrame([[1, 2, 3], [4, 5, 6]],
...: columns=['a', 'b', 'c'], index=['x', 'y'])
In [4]: view(df)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
TypeError: 'DataFrame' object is not callable
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
TypeError: 'DataFrame' object is not callable
---------------------------------------------------------------------------
(...)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
TypeError: 'DataFrame' object is not callable
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
TypeError: 'DataFrame' object is not callable
Out[4]: <gtabview.ViewController at 0x102264790>
In [5]:
Yes, I changed a lot of stuff in order to keep the GUI off the main thread, depending if you use matplotlib or not (which also uses QT). In essence, if you use matplotlib, you should import it first in order to gtabview initialize correctly. If matplotlib is not present, gtabview assumes control of QT in order to have a completely detached GUI.
I'm not sure it's the best approach. Maybe detach=False should always be a default. I'll write some notes in the README about this.
I fixed the issue with python 3 (didn't know that 'None' is now not usable in contextes like this)
I pushed some support for MultiIndex for both rows and columns. gtabview -H[n] now works correctly. But so does this:
import numpy as np
import pandas as pd
import gtabview
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
arrays2 = [['bar', 'bar', 'bar', 'bar', 'baz', 'baz', 'baz', 'baz'],
['foo', 'foo', 'qux', 'qux', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples2 = list(zip(*arrays2))
index2 = pd.MultiIndex.from_tuples(tuples2, names=['first', 'second', 'third'])
df = pd.DataFrame(np.random.randn(8, 8), index=index2, columns=index)
gtabview.view(df)
There is some drudgery to make it work, so I'd like a few comments on how it works/looks on mac.
I now also added the ability to set the index size when loading an external file.
gtabview -I4 -H3 tabview/samples/multiindex.csv
seems to be mostly ok, except for that extra empty like. Since you supplied that sample file from tabview, how did you write it?
In the end, I now skip the empty like if it's just after the header.
multiindex.csv
looks correct now.
That's fine. It works both with Python 3 and Python 2 on my Anaconda install (Mac OS X 10)
Code to generate multiindex.csv is available here https://github.com/firecat53/tabview/issues/66#issuecomment-69307207
On 24/07/15 08:54, scls19fr wrote:
That's fine.
Code to generate multiindex.csv is available here firecat53/tabview#66 (comment) https://github.com/firecat53/tabview/issues/66#issuecomment-69307207
I'm only wondering where I could draw the level names.
The upper-left corner seems logical, but there's almost never enough space unless I enlarge the index width, which I would like to avoid.
Do you think is there anything still worth sharing with spyder now?
They may be interested by your code (both are MIT licensed) ? ;-)
On 24/07/15 14:28, scls19fr wrote:
They may be interested by your code (both are MIT licensed) ? ;-)
Hopefully they can recycle the class as a module. There's no writing yet though.
Hi @wavexx, Spyder maintainer here. It seems you've done a lot good work with gtabview
! :-)
We'd love to share code and use gtabview
as a library for our Variable Explorer widget. We already solved the issue with Qt being really slow with millions of rows or hundreds of columns, by fetching data on demand. And you're handling correctly multi-index DataFrame's and have support for Blaze (which is very cool). So I see a lot of promise here ;-)
Pinging also @quiqua and @kaotika (from pandas-qt) to see what they think about this.
@scls19fr, to see the tail of a big DataFrame on Spyder, you only need to press on its Index
column to sort it in reverse :-)
Thanks for the tip
Hi @wavexx, Spyder manteiner here. We have made quite an integration with the gtabview project and Spyder in this PR so first of all thanks a lot for working in gtabview and sharing it :). Also, if you think something could be helpful for the gtabview project and you have any question about it, we will be glad to help :)
On Fri, Oct 06 2017, Daniel Althviz Moré wrote:
Hi @wavexx, Spyder manteiner here. We have made quite an integration with the gtabview project and Spyder in this PR so first of all thanks a lot for working in gtabview and sharing it :). Also, if you think something could be helpful for the gtabview project and you have any question about it, we will be glad to help :)
Hi Daniel, I'm glad this was helpful. I was busy on other projects, so I didn't work on gtabview in the last months.
I saw some nice features in the PR, and without looking at the code, I have some questions.
When you sort, I guess you actually make an implicit copy of the dataframe by calling the underlying sort method?
Hiding the index row/column is nice, but how does it work when a subset of the DF is being used? I actually did this intentionally, as having both logical and index values visible become very useful as soon you subset anything. Hiding the index when it's identical is nicer visually though.
I agree with one of the comments about the color-banding.
I initially wanted to copy the same method notebooks use (by simply hiding repeated values), but you subsequently need to ensure at least one label is fully visible in the current view. When a cell is partially visible, you need to repeat at least once.
I admit the current code was just easier to do as it requires no look-back.
Hi again @wavexx, for the sort, we have a reference to the dataframe and as you say, we use the sort method of the dataframe all of that logic is in the DataFrameModel class. The DataFrameModel class was based in the class ArrayModel from the arrayEditor of Spyder and the class DataFrameModel from the pandas project present in pandas.sandbox.qtpandas in v0.13.1 and currently also in the ExtDataModel and ExtFrameModel classes of gtabview.
About the hiding, what do you mean with a subset?, maybe is related with fetching a initial portion of the dataframe in order to show it? If is about the fetch we use the column count and row count with variables that limit the initial number of rows/columns, and a fetch method that use that variables to insert more columns/rows with the beginInsertRows
or beginInsertColumns
. A similar logic is used for the DataFrameHeaderModel and DataFrameLevelModel.
If you have more questions let us know, we will be glad to answer you 👍
Hello,
Spyder IDE https://github.com/spyder-ide/spyder provide a convenient user interface named "variable editor" (MIT license)
it will be great to use a part of their code to have a standalone version of this variable editor.
Kind regards
See https://github.com/spyder-ide/spyder/issues/2553