Share code with Spyder variable editor

s-celles commented 9 years ago

Hello,

Spyder IDE https://github.com/spyder-ide/spyder provide a convenient user interface named "variable editor" (MIT license)

capture d ecran 2015-07-20 a 09 55 51

it will be great to use a part of their code to have a standalone version of this variable editor.

Kind regards

See https://github.com/spyder-ide/spyder/issues/2553

wavexx commented 9 years ago

It's a nice idea, but I'd like to see more options. Are you aware of other spreadsheet-looking things done in python? (as long as it's not tk - I'm fine with qt, wx or gtk in order of preference).

The thing is, the table controller in QT is extremely slow. As you will see, it can barely handle 1m cells even on a decent i5. There are means to implement the data/view controller itself, but then there's a lot of work to do. You also cannot reasonably have more than a single header row/column (which is a pity, since DF would look so nice) without also re-implementing a lot of stuff.

In that sense, I'd like to take as much as what's already available.

s-celles commented 9 years ago

Sorry I don't know other spreadsheet looking Python project.

I try this in Spyder (IPython).

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.random((5000000,3)))

and go to Spyder variables editor

Scroll down... you will notice that table is only feed with (around) 500 rows

Going to the bottom of DataFrame is really difficult.

wavexx commented 9 years ago

Meanwhile I discovered pandas had something like this already:

https://github.com/pydata/pandas/blob/1d8717446d8666207b63ae324e56be60a0b01b07/pandas/sandbox/qtpandas.py

It has been split into a separate module:

https://github.com/datalyze-solutions/pandas-qt

Ironically, it handles multi-level indexes worse than what gtabview is already doing, and seems to be equally slow :(

wavexx commented 9 years ago

Or maybe not. I now implemented some simple models, I can display millions of rows without problems.

s-celles commented 9 years ago

That's very efficient! It can display millions of rows but also millions of columns! You did a great work Thanks

wavexx commented 9 years ago

On 21/07/15 15:47, scls19fr wrote:

That's very efficient! It can display millions of rows but also millions of columns! You did a great work

Now it just needs a few keybindings to make it similar to navigate to tabview.

s-celles commented 9 years ago

Yes with keybindings it will be more convenient.

Keeping window size will be probably an other interesting improvement. Several size policies could exists:

calling view(...) always displays a window which have a defined default size
calling view(...) with same object (might use id(obj)) displays a window which have last size

wavexx commented 9 years ago

On 21/07/15 15:58, scls19fr wrote:

Yes with keybindings it will be more convenient.

Keeping window size will be probably an other interesting improvement. Several size policies could exists:

calling |view(...)| always displays a window which have a defined default size

calling |view(...)| with same object (might use |id(obj)|) displays a window which have last size

This is already done. You have two more parameters in gtabview.view: modal and recycle.

By default, modal=True and it will block the python process as tabview.view does. However, this being a gui, with modal=False the window will be asyncronous (you can keep working). calling view(modal=False) will reuse the same window (recycle=True by default).

s-celles commented 9 years ago

Thanks for these tips.

I wasn't aware of this non-modal mode.

But you should notice that:

if you change data in dataframe from ipython (df.loc[0, 0] = 1.23), window shows these changes (and that's a good thing)
if you add new column to dataframe from ipython (df["new_col"] = 1), window won't update (unfortunately) you need to call view(df, modal=False)

Calling (again) view(df, modal=False) should also put the window on top

But I'm sorry I think I wasn't clear enouth if you close window (modal or not) you don't keep window size.

wavexx commented 9 years ago

Try now. I changed the defaults/names to wait=False, recycle=True.

s-celles commented 9 years ago

I think you changed something else.

With Python 3

In [18]: from gtabview import view
  File "//anaconda/lib/python3.4/site-packages/gtabview/__init__.py", line 132
    app.postEvent(self._view, QtCore.QEvent(QtCore.QEvent.None))
                                                             ^
SyntaxError: invalid syntax

s-celles commented 9 years ago

With Python 2

pc:~ scls$ source activate py2
discarding //anaconda/bin from PATH
prepending //anaconda/envs/py2/bin to PATH
(py2)pc:~ scls$ pip uninstall gtabview
Cannot uninstall requirement gtabview, not installed
(py2)pc:~ scls$ pip install git+git://github.com/wavexx/gtabview
Collecting git+git://github.com/wavexx/gtabview
  Cloning git://github.com/wavexx/gtabview to /var/folders/j_/v8b1bst93_94t724ptsswfsr0000gn/T/pip-J7fZM7-build
Requirement already satisfied (use --upgrade to upgrade): setuptools in /anaconda/envs/py2/lib/python2.7/site-packages/setuptools-18.0.1-py2.7.egg (from gtabview==0.1)
Installing collected packages: gtabview
  Running setup.py install for gtabview
Successfully installed gtabview-0.1
(py2)pc:~ scls$ ipython
Python 2.7.10 |Anaconda 2.3.0 (x86_64)| (default, May 28 2015, 17:04:42)
Type "copyright", "credits" or "license" for more information.

IPython 3.2.1 -- An enhanced Interactive Python.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from gtabview import view

In [2]:

In [2]: import pandas as pd

In [3]: df = pd.DataFrame([[1, 2, 3], [4, 5, 6]],
   ...:                   columns=['a', 'b', 'c'], index=['x', 'y'])

In [4]: view(df)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
TypeError: 'DataFrame' object is not callable
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
TypeError: 'DataFrame' object is not callable
---------------------------------------------------------------------------

(...)

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)
TypeError: 'DataFrame' object is not callable
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
TypeError: 'DataFrame' object is not callable
Out[4]: <gtabview.ViewController at 0x102264790>

In [5]:

wavexx commented 9 years ago

Yes, I changed a lot of stuff in order to keep the GUI off the main thread, depending if you use matplotlib or not (which also uses QT). In essence, if you use matplotlib, you should import it first in order to gtabview initialize correctly. If matplotlib is not present, gtabview assumes control of QT in order to have a completely detached GUI.

I'm not sure it's the best approach. Maybe detach=False should always be a default. I'll write some notes in the README about this.

I fixed the issue with python 3 (didn't know that 'None' is now not usable in contextes like this)

wavexx commented 9 years ago

I pushed some support for MultiIndex for both rows and columns. gtabview -H[n] now works correctly. But so does this:

import numpy as np
import pandas as pd
import gtabview

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])

arrays2 = [['bar', 'bar', 'bar', 'bar', 'baz', 'baz', 'baz', 'baz'],
           ['foo', 'foo', 'qux', 'qux', 'foo', 'foo', 'qux', 'qux'], 
           ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples2 = list(zip(*arrays2))
index2 = pd.MultiIndex.from_tuples(tuples2, names=['first', 'second', 'third'])

df = pd.DataFrame(np.random.randn(8, 8), index=index2, columns=index)
gtabview.view(df)

multiindex

There is some drudgery to make it work, so I'd like a few comments on how it works/looks on mac.

wavexx commented 9 years ago

I now also added the ability to set the index size when loading an external file.

gtabview -I4 -H3 tabview/samples/multiindex.csv

seems to be mostly ok, except for that extra empty like. Since you supplied that sample file from tabview, how did you write it?

wavexx commented 9 years ago

In the end, I now skip the empty like if it's just after the header. multiindex.csv looks correct now.

multiindex

s-celles commented 9 years ago

That's fine. It works both with Python 3 and Python 2 on my Anaconda install (Mac OS X 10)

Code to generate multiindex.csv is available here https://github.com/firecat53/tabview/issues/66#issuecomment-69307207

wavexx commented 9 years ago

On 24/07/15 08:54, scls19fr wrote:

That's fine.

Code to generate multiindex.csv is available here firecat53/tabview#66 (comment) https://github.com/firecat53/tabview/issues/66#issuecomment-69307207

I'm only wondering where I could draw the level names.

The upper-left corner seems logical, but there's almost never enough space unless I enlarge the index width, which I would like to avoid.

wavexx commented 9 years ago

Do you think is there anything still worth sharing with spyder now?

s-celles commented 9 years ago

They may be interested by your code (both are MIT licensed) ? ;-)

wavexx commented 9 years ago

On 24/07/15 14:28, scls19fr wrote:

They may be interested by your code (both are MIT licensed) ? ;-)

Hopefully they can recycle the class as a module. There's no writing yet though.

ccordoba12 commented 9 years ago

Hi @wavexx, Spyder maintainer here. It seems you've done a lot good work with gtabview! :-)

We'd love to share code and use gtabview as a library for our Variable Explorer widget. We already solved the issue with Qt being really slow with millions of rows or hundreds of columns, by fetching data on demand. And you're handling correctly multi-index DataFrame's and have support for Blaze (which is very cool). So I see a lot of promise here ;-)

Pinging also @quiqua and @kaotika (from pandas-qt) to see what they think about this.

ccordoba12 commented 9 years ago

@scls19fr, to see the tail of a big DataFrame on Spyder, you only need to press on its Index column to sort it in reverse :-)

s-celles commented 9 years ago

Thanks for the tip

dalthviz commented 7 years ago

Hi @wavexx, Spyder manteiner here. We have made quite an integration with the gtabview project and Spyder in this PR so first of all thanks a lot for working in gtabview and sharing it :). Also, if you think something could be helpful for the gtabview project and you have any question about it, we will be glad to help :)

wavexx commented 7 years ago

On Fri, Oct 06 2017, Daniel Althviz Moré wrote:

Hi @wavexx, Spyder manteiner here. We have made quite an integration with the gtabview project and Spyder in this PR so first of all thanks a lot for working in gtabview and sharing it :). Also, if you think something could be helpful for the gtabview project and you have any question about it, we will be glad to help :)

Hi Daniel, I'm glad this was helpful. I was busy on other projects, so I didn't work on gtabview in the last months.

I saw some nice features in the PR, and without looking at the code, I have some questions.

When you sort, I guess you actually make an implicit copy of the dataframe by calling the underlying sort method?

Hiding the index row/column is nice, but how does it work when a subset of the DF is being used? I actually did this intentionally, as having both logical and index values visible become very useful as soon you subset anything. Hiding the index when it's identical is nicer visually though.

I agree with one of the comments about the color-banding.

I initially wanted to copy the same method notebooks use (by simply hiding repeated values), but you subsequently need to ensure at least one label is fully visible in the current view. When a cell is partially visible, you need to repeat at least once.

I admit the current code was just easier to do as it requires no look-back.

dalthviz commented 7 years ago

Hi again @wavexx, for the sort, we have a reference to the dataframe and as you say, we use the sort method of the dataframe all of that logic is in the DataFrameModel class. The DataFrameModel class was based in the class ArrayModel from the arrayEditor of Spyder and the class DataFrameModel from the pandas project present in pandas.sandbox.qtpandas in v0.13.1 and currently also in the ExtDataModel and ExtFrameModel classes of gtabview.

About the hiding, what do you mean with a subset?, maybe is related with fetching a initial portion of the dataframe in order to show it? If is about the fetch we use the column count and row count with variables that limit the initial number of rows/columns, and a fetch method that use that variables to insert more columns/rows with the beginInsertRows or beginInsertColumns. A similar logic is used for the DataFrameHeaderModel and DataFrameLevelModel.

If you have more questions let us know, we will be glad to answer you 👍

TabViewer / gtabview

Share code with Spyder variable editor #7