GabrielFM / prettytable

Automatically exported from code.google.com/p/prettytable
Other
0 stars 0 forks source link

Support large tables in output #64

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Generate data (e.g 2.9 million rows)
2. Load into PrettyTable using add_row()
3. Print the table

Example:

table_columns = ['file id', 'parent directory (directory id)', 'file name', 
'type', 'extra info']
display_table = PrettyTable(table_columns)
for file_data in file_query():  # this could also be the db_cursor variant
     display_table.addrow([file_data[0], file_data[1], file_data[2], file_data[3], file_data[4]])

print(display_table)

What is the expected output? What do you see instead?

expect that the table gets displayed
instead, it crashes on a memory compliant as PrettyTable tries to convert 
itself into a one big string

What version of the product are you using? On what operating system?

Linux (Kubuntu 14.10), Python 2.7, prettytable 0.7.2

Please provide any additional information below.

Original issue reported on code.google.com by gpcl...@gmail.com on 15 Dec 2014 at 5:02

GoogleCodeExporter commented 8 years ago
If you were using github or git I'd submit a PR; but since you're using SVN 
I've attached an updated copy of the prettytable.py. This version does two 
things:

1. Enables my use case above by introducing a new function - print_table() - 
that prints the lines to a file (default sys.stdout) instead of building them 
into a list.
Instead of:

print(myprettytable)

You do:

myprettytable.print_table()

It also takes a file and end parameter like the print() does so callers can 
redirect as desired.

2. Reduces memory significantly by using a generator - my test went from 
11-12GB of RAM down to just under 7 GB of RAM usage.

The original get_string() was split into a few more functions to re-use the 
code between the get_string() and print_table().

While this version works, and does a great job for the really big tables; it 
could be further improved if the formatted data did not have to be saved.

prettytable_alternate.py is an attempt to use more generators to reduce memory. 
Indeed it did work - peak was down to just over 5GB, and normal was around 
4.7GB - but it also took a lot longer to output the data (it also had to format 
the data twice due to the row generator). However, in both cases data is being 
outputted earlier than the original implementation since it can be outputted 
before all the data is completely built up.

Perhaps you have other ideas on how to speed this all up and reduce memory 
consumption for the very large table variants.

Original comment by gpcl...@gmail.com on 15 Dec 2014 at 11:35

Attachments: