Open pjvandehaar opened 7 years ago
On Wed, Nov 30 2016, Peter VandeHaar wrote:
I'd like for stdin to only be read as needed. This could probably also let
tabview
work with iterators when used from Python, but I don't use that so I don't know.
This is hard with the current code.
(Do you know of another tool that does this? I haven't found one.)
Never found one, even though that's something I'd like as well.
You could actually cheat with a buffering program that, besides buffering, sends EOF at regular intervals (so that you could just reload the file live in tabview), but the ones that I know don't do strictly that.
The problems this introduces are:
- When
self.column_width_mode
ismax
ormode
, the width won't reflect rows that haven't been read yet.
This wouldn't be a problem really, if you show what's going on. Triggering a recalculation is generally more user-friendly than auto-sizing the columns randomly.
If I start work on a PR, do you have recommendations?
Godspeed? ;)
I'm not sure I fully understood the implementation details. Your plan is just to keep appending on csv_data directly as far as I understood.
In this case, I would keep an initial buffer in the generator to perform the encoding detection and padding which is unrelated to what the viewer is going.
The viewer shouldn't be concerned with any part of the reading process. Just provide it with a matrix to show. This way, as the data comes in, you can append to csv_data into chunks and update the internal state as little as possible.
If you see it the other way around, if you have a data structure you want to show in tabview, when used as a module, you'd like to skip all this process entirely.
tabview myhugefile.csv
.c
to get mode-widths.c
again.
Currently, tabview doesn't work well when used with large or unending files. For example,
cat /dev/urandom | tr -cd "fish,\n" | tabview -
doesn't work.I'd like for stdin to only be read as needed. Maybe this will also let tabview display iterators when used from Python.
(Do you know of any other commandline csv-viewer that does streaming to handle large files? I haven't found one.)
Changes that will be needed:
process_data
needs to be a generator. Thenview()
will dodata_processor = process_data(...)
.Viewer
will docsv_data.append(next(data_processor))
when it reaches the end ofcsv_data
.detect_encoding()
will be run on the first 1000 lines to determineenc
. After those lines are exhausted,detect_encoding()
will be run on each new line, updatingenc
if needed.pad_data()
can't happen inprocess_data
.Viewer
will runcsv_data = pad_data(csv_data)
if a new line fromdata_processor
is longer thanself.num_data_columns
.Viewer
needs to have a few minor changes.Forward searching will still work, and will just rapidly consume lines from
data_processor
.When the user tries to sort,
Viewer
will docsv_data.extend(data_processor)
, which might take too long or possibly forever. User's problem.Later on, it'd be fun to make mode and max column widths update as new data is read in, by storing the
collections.Counter()
, updating it for each new line, and updatingself.column_width
as needed.The problems this introduces are:
self.column_width_mode
ismax
ormode
, the width won't reflect rows that haven't been read yet.If you consider these drawbacks quite bad, I'd be happy with a flag
--stream
.If I start work on a PR, do you have any recommendations?