BdR76 / CSVLint

CSV Lint plug-in for Notepad++ for syntax highlighting, csv validation, automatic column and datatype detecting, fixed width datasets, change datetime format, decimal separator, sort data, count unique values, convert to xml, json, sql etc. A plugin for data cleaning and working with messy data files.
GNU General Public License v3.0
151 stars 8 forks source link

ANSI files with corner separator don't highlight columns #66

Closed favasa closed 1 year ago

favasa commented 1 year ago

In ANSI files with a corner separator, the program detects columns and validate data, but not highlight columns.

When pressing Reformat, colours appear, but also appear a  char surrounding the separator.

Sample files and screenshots attached.

Corner_UTF8.csv Corner_ANSI.csv ANSI_Before_Reformat ANSI_After_Reformat UTF8

Notepad++ debug info:

Notepad++_DebugInfo.txt

rdipardo commented 1 year ago

Looks like a duplicate of #52

Try a newer version: https://github.com/BdR76/CSVLint/releases

favasa commented 1 year ago

I've tried the last version today. Now the strange char doesn't appear by highlight is not working after formatting:

image

rdipardo commented 1 year ago

highlight is not working after formatting

That's a separate issue; nothing to do with the column separator character.

The same thing happens every time a file is reloaded (File > Reload from Disk, or press CTRL+R).

Here's a demonstration where ; is the column separator:

reloaded-bufr-not-lexed

favasa commented 1 year ago

Ok... Close that issue then. Thanks!

BdR76 commented 1 year ago

Actually no, this issue is not solved yet in the current version, the example files UTF8 and ANSI are still displayed like in the screenshot below.

csvlint_corner_separator

The problem is that the separator character is passed to the Lexer through the ScintillaGateway.SetProperty function and recieved by the Lexer.PropertySet function.

It is passed to the SetProperty like ¬ (char 172) but then it is received in PropertySet like a two-character string ¬ (char 194 + char 172). and the Lexer just took value[0] the first character from the string. So under the hood it incorrectly takes the  as the separator character. This is why the CSV Lint lexer makes the file all one color, just blue, becuase it doesn't find any separator.

I've just now changed this line so it takes the last character:

-- old:   separatorChar = value[0];
++ new:   separatorChar = value[value.Length-1]; //hack to fix syntax highlighting for corner ¬ separator

You can try it by downloading the development DLL v0.4.6.5ẞ, this does give the correct syntax highlighting for both the UTF 8 and ANSI file. But it feels a bit like a hack. The separator character shouldn't be converted to a two-byte value in the first place.

BdR76 commented 1 year ago

This issue is fixed in the latest version v0.4.6.5, see the releases page. You can download it manually and it will be available in the next Notepad++ update in the Plugin Manager.