"I'm developing an application using PyGObject and Python 3 for the Music
Player Daemon (https://github.com/multani/sonata/) and I've got several
users who are complaining about the lack of performance of the
application when loading lot of song into our "Current" playlist (which
is a Gtk.TreeView). Especially compared to other MPD clients which are
nearly instantaneous.
I'm going to describe how the application currently works, the problem
I'm facing and possible solutions I'm exploring. It's a bit long, so...
tl;dr: how to efficiently display (lot of) information into a TreeView?
Current implementation
I get a list of Song objects with a number of attributes like .album,
.artist, etc.
a user can configure which columns he wants to see in the player with
a formatting mini-language. For example, he can set the formatting to
be "%N|%T|%A|%Y - %B - %N" which is going to create a tree view
with 4 columns:
the 1st one will have the track number (%N)
the 2nd one will have the track name (%T)
the 3rd one will have the artist name (%A)
the last one will be a string of the form "Year - Album - TrackNumber"
I have a Gtk.ListStore which is built from the definition of the
columns above and which stores the following:
a Song object
for each columns defined above, the song formatted according to
the user configuration
an additional column to set a row to bold or not (to see the
current song)
Once we get the songs to be displayed, the application loops over all
the songs and:
applies a formatting function to the current song which returns a list
of formatted values
stores in the ListStore the song + the formatted values (+ the
"bold" attribute)
Current performances
I made a small test case to reproduce the problem (attached to this
email), and here are the results I've got:
it takes about 1.5 sec. to feed the ListStore with Python 2 and pygtk,
with about half of the time spent into ListStore.append
it takes about 15 sec. to run the same with Python 2/3 and PyGObject
where about more that 90% of the time spent into ListStore.append
(78% in TreeModel._convert_value)
and using pyprof2calltree and KCacheGrind to get the percentage]
In the real application, it's actually much longer due to the more
complex formatting functions and inefficient access to the Song object
attributes, but PyGObject always comes first with more thatn 60% of the
time inside.
Possible solutions?
I tried several approaches to solve this problem:
I tried to populate the ListStore by chunks instead of doing it in one
block. Although it doesn't block the UI anymore, it takes about 1
minute to complete and the experience is not that great. I would also
need to have more code to handle the case where the content of the
playlist changes while it's still being populated from the previous
set.
I tried to minimize the size of the ListStore by only adding the Song
object. The goal here was to reduce the cost of the formatting + the
cost of adding "so many" columns to the ListStore. The formatting was
then done using cell_data_func functions on each of the view's
columns. It's much faster to append but noticeably slower to display
and to scroll around.
I tried to put some cache (either on the formatting side, or by
having a larger ListStore with the additional columns set to None,
filled and used after the column value has been computed) but it's
still not that great. I guess calling the cell_data_func has a high
cost after all.
I'm a bit out of idea on what else I can try. I can provide more
information about those numbers, especially if you need some profiling
data. I'm all in for more PyGObject performances too, although I don't
where to start in the code base, but I'm willing to try stuff.
"
In Quod Libet we use an optimized single column ListStore subclass [0]
with a few fast paths and hacks to remove the override overhead. We
only use cell_data_func and do the formatting and detecting the
current song in there. I think append_many() is about 3-4 times faster
than Gtk.ListStore.append, so still not as fast as the old one (or
reverse() + insert(0,x), which was even faster in pygtk), but more
bearable.
Since I'm the sole copyright owner of that file I could multi-license
it if needed.
Regarding scrolling: In addition to caching for formatting, I also
save the last result for each cell_data_func and don't update the cell
renderer if there is no new data. Depending on the type of formatting
you do, you can skip the formatting or the set_property() call. The
cell renderer happily draws the same thing in a different row then.
Say, multiple redraws of the same cell, or it's an album name column
where many entries are the same. This helped a bit in GTK+2 times at
least.
---
Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/20440639-speed-up-loading-in-models?utm_campaign=plugin&utm_content=tracker%2F351726&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F351726&utm_medium=issues&utm_source=github).
interesting conversation here on the py mailing list
"I'm developing an application using PyGObject and Python 3 for the Music Player Daemon (https://github.com/multani/sonata/) and I've got several users who are complaining about the lack of performance of the application when loading lot of song into our "Current" playlist (which is a Gtk.TreeView). Especially compared to other MPD clients which are nearly instantaneous.
I'm going to describe how the application currently works, the problem I'm facing and possible solutions I'm exploring. It's a bit long, so...
tl;dr: how to efficiently display (lot of) information into a TreeView?
Current implementation
Once we get the songs to be displayed, the application loops over all the songs and:
Current performances
I made a small test case to reproduce the problem (attached to this email), and here are the results I've got:
[I got those values using:
python3 -m cProfile -o profile.profile33 testpy
and using pyprof2calltree and KCacheGrind to get the percentage]
In the real application, it's actually much longer due to the more complex formatting functions and inefficient access to the Song object attributes, but PyGObject always comes first with more thatn 60% of the time inside.
Possible solutions?
I tried several approaches to solve this problem:
cell_data_func
functions on each of the view's columns. It's much faster to append but noticeably slower to display and to scroll around. I tried to put some cache (either on the formatting side, or by having a larger ListStore with the additional columns set to None, filled and used after the column value has been computed) but it's still not that great. I guess calling the cell_data_func has a high cost after all.I'm a bit out of idea on what else I can try. I can provide more information about those numbers, especially if you need some profiling data. I'm all in for more PyGObject performances too, although I don't where to start in the code base, but I'm willing to try stuff. "
https://drive.google.com/file/d/0B5yvyAZqOxQrQVVrWm42WUtzZVVaeWRzMjVLZXZKZU4xWkpZ/edit?usp=sharing
In Quod Libet we use an optimized single column ListStore subclass [0] with a few fast paths and hacks to remove the override overhead. We only use cell_data_func and do the formatting and detecting the current song in there. I think append_many() is about 3-4 times faster than Gtk.ListStore.append, so still not as fast as the old one (or reverse() + insert(0,x), which was even faster in pygtk), but more bearable.
Since I'm the sole copyright owner of that file I could multi-license it if needed.
Regarding scrolling: In addition to caching for formatting, I also save the last result for each cell_data_func and don't update the cell renderer if there is no new data. Depending on the type of formatting you do, you can skip the formatting or the set_property() call. The cell renderer happily draws the same thing in a different row then. Say, multiple redraws of the same cell, or it's an album name column where many entries are the same. This helped a bit in GTK+2 times at least.