BdR76 / CSVLint

CSV Lint plug-in for Notepad++ for syntax highlighting, csv validation, automatic column and datatype detecting, fixed width datasets, change datetime format, decimal separator, sort data, count unique values, convert to xml, json, sql etc. A plugin for data cleaning and working with messy data files.
GNU General Public License v3.0
151 stars 8 forks source link

Highlight update and multiline column value #53

Closed Setzer3 closed 1 year ago

Setzer3 commented 1 year ago

When editing a CSV file with multi-line text (separator is "), if I edit the second line or following of this column, then the highlighting update process acts as if line being edited is a new line with first column described from first character.

Before edition: image

After edition of 3rd line: image

The only way to circumvent this issue, to my knowledge, is to save file after editing and reload it.

BdR76 commented 1 year ago

Thanks for posting the issue. You can fix the colors by triggering a refresh of the syntax highlight buffer, i.e. make Notepad++ re-evaluate the syntax colors for the entire file. Like you point out, this can be done by closing/opening the file, but you can also switch between different tabs to another file and back, or select Language > None (normal text) and then back to Language > CSVLint.

This is indeed a bug and it is caused by when you edit halfway in the file then the lexer (which determines the syntax highlighting) only re-evaluates the colors starting from the position where you edited the text. In this case the variables are not initialised correctly because it always starts with color 1 instead of color 3 in this case.

The code for syntax highlighting the multiline text values with quotes is already tricky, but I'll see if I can fix it.

BdR76 commented 1 year ago

I've updated the lexer which only slightly improves it, meaning in some cases it will handle a two-line quote string correctly now, but in most cases it will still result in incorrect syntax highlighting when you edit a quoted-multiline-string on the second/third/fourth/etc line.

The problem is that when the user edits the text, the lexer will be triggered with the start position of the edit part as a parameter. When the opening quote " character is located one or more lines before that edit point, then the lexer assumes it is just a new line, so a new record, and it will initialise with the first color index. There's no way to tell if the user was editing in the middle of a quoted string, and always checking backwards to find an opening-quote would in most cases go all the way back to the start of a file which could lead to performance issues.

You can download the beta DLL here which contains this "update". Not sure how to properly fix this in all cases, I'll return to this at a later time.