chrisbra / csv.vim

A Filetype plugin for csv files
http://www.vim.org/scripts/script.php?script_id=2830
Vim License
1.27k stars 62 forks source link

Dealing with large CSVs #42

Closed joyofdata closed 4 years ago

joyofdata commented 9 years ago

Hey Chris,

first of all thanks for crafting this awesome VIM plugin!

I tested you file with a more or less large CSV and tried the nifty :%ArrangeColumn on it - and it takes very long to finsh. Is there a trick maybe to deal with large CSVs? Something like partial or streamed loading of a file? Or a lazy evaluating application of commands?

I checked the help file (Ctrl-F) but couldn't find something helping me in that regard.

Kind regards

Raffael

chrisbra commented 9 years ago

Oh wow,. that file just killed my vim. I'll check if there is some way to make :ArrangeCol faster

ViViDboarder commented 9 years ago

I may try myself, but could be a good use case for JobControl on Neovim here. Just planting the seed.

chrisbra commented 9 years ago

Isn't JobControl only for external commands, like system() or :! or so? I don't think, this can make internal VimL functions faster...

majkinetor commented 8 years ago

I used it on 400MB csv and it works for few minutes very good, then vim blocks.

chrisbra commented 8 years ago

this probably depends on what you do. certain operations will make Vim slow, like :ArrangeCol or so.

majkinetor commented 8 years ago

I just to move full screen. The vim is frozen for 30seconds. I have 2 million records with 18 columns.

protist commented 8 years ago

I just tried to open a 3 MB file with 1600 rows and four columns. I tried to page down a few times and vim freezes for about a minute. Works perfectly on small files, though.

protist commented 8 years ago

Oops, my mistake; a few fields were 200 000 characters long. Sorry for the noise!

chrisbra commented 8 years ago

Yeah, this is a known problem. I intend to rewrite part of the :ArrangeCol command using python (which would only work on a vim compiled with +python), but I haven't found the time for that (and I don't know python well enough to do that quickly), therefore it might take some time, until I get to do it (and there is always something else to do :(

I guess, if moving around bothers you, you can disable syntax highlighting and you might want to experiment with different 're' settings. Also :syntime report would be interesting for such slow csv files. Also I think there is a help entry, on how to switch to a possible simpler syntax highlighting, which would be worth a try.

Michel-Haber commented 5 years ago

Hi Chris,

I also wanna thank you for the plugin. Makes life much easier when handling csv files. However, I'm having lots of trouble dealing with a 5.3 MB file. (52242 rows x 7 cols) Each time I try moving up or down, the plugin complains that it will only use the first 10000 lines to compute column width. Is there some configuration I can apply to fix this?

Also initial loading time is quite large. Are you planning any updates that can improve this?

chrisbra commented 5 years ago

I can only do so much and depending on the csv file, creating the regular expressions can be expensive. What configuration are you using for the plugin? I suppose some kind of autocommand like it is mentioned in the documentation? https://github.com/chrisbra/csv.vim/blob/master/doc/ft-csv.txt#L1648-L1678

Michel-Haber commented 5 years ago

My config is as follows:

let g:csv_no_progress = 1
let g:csv_strict_columns = 1
let g:csv_start = 1
let g:csv_end = 100

Is there something I'm missing that would allow it to run faster?

chrisbra commented 5 years ago

I wonder where this:

Each time I try moving up or down, the plugin complains that it will only use the first 10000 lines to

comes from. Is there any CursorMoved autocommand installed?

Is there something I'm missing that would allow it to run faster?

I am not sure. Some profiling information would be good to see what exactly is slow.

Michel-Haber commented 5 years ago

No, I don't have any autocommand installed that works on CursorMoved. This is my plugin list. Is it possible that csv.vim is incompatible with one of them?

vim-airline/vim-airline taglist.vim airblade/vim-gitgutter scrooloose/nerdtree w0rp/ale ntpeters/vim-better-whitespace pseewald/vim-anyfold neovimhaskell/haskell-vim tpope/vim-commentary cespare/vim-toml junegunn/fzf junegunn/fzf.vim

In any case, I have profiled the problem and noticed the following:

  1. My tendency to use arrows instead of hjkl :( is the main reason for the huge delays. (with hjkl I have almost no delay) I do not understand the difference though...
  2. csv#MoveCol seems to be the most expensive function call. (~ 10s total run time)

Note that I profiled with a file of ~3000 lines, so I didn't get the message I mentioned before, but got huge delays nonetheless.

I have attached the profiling logs, so that you could have a look :) profile.log

chrisbra commented 5 years ago

hm, looks suspicious that csvColWidth() is called so many times (and recalculates the current widht over and over again). I am not using this plugin anymuch anymore, since I don't have to handle CSV files a lot nowadays. For now, try to disable the mapping of <up> and <down> using

:let g:csv_nomap_up=1
:let g:csv_nomap_down=1

that should make <Up> and <Down> work faster.

the42 commented 4 years ago

@chrisbra @Michel-Haber A great releave and -maybe- relatively easy fix for the admittedly anoying "CSV file to large only checking the ..." message would be to show it only once per buffer?

BTW setting the old regex engine set re=1 which has better performance on edge cases causes an error.

chrisbra commented 4 years ago

thanks, I have a look.

Grueslayer commented 4 years ago

Hi Chris, using a 27000 line CSV comes with the message about the 10000 line limit TWICE for each cursor-up or down. Disabling it (like you mentioned) fixed it so far for me.

chrisbra commented 4 years ago

Ah, I wanted to have a look, forgot about it. Anyhow, should be fixed now.

albfan commented 7 months ago

Sorry for suggest other plugin here. csv.vim served me for long time. But fails with large files (20Mb or above)

https://github.com/mechatroner/rainbow_csv this just works, you can filter, etc...