gobo-eiffel / gobo

The Gobo Eiffel Project provides the Eiffel community with free and portable Eiffel tools and libraries.
https://sourceforge.net/projects/gobo-eiffel/
Other
59 stars 24 forks source link

%R into last cell of read_file #37

Closed phgachoud closed 5 years ago

phgachoud commented 5 years ago

Screenshot_20190704_165858 Screenshot_20190704_170136

is it the expected behaviour to have a "%R" @ the end of a csv row (DS_ARRAYED_LIST [STRING_8]) ??? if yes what is the reason? or is it designed for crlf instead of cr only...

Seems there is something I wasn't expecting here do you?

in fact, why is there the line 182 into UT_CSV_HANDLER removing this line and removing 1 from l_cell.count it works as I expect it to

elseif c = '%N' then
    if has_quote then
        l_cell.append_character (c) -- This line, why append a the %N character into last cell? When removed
    else
    l_cells.force_last (l_cell.substring (1, l_cell.count - 1)) --- and here -1
    STRING_.wipe_out (l_cell)
    a_action.call ([l_cells])
    l_cells.wipe_out

See the attached file in case its a csv, but changed extension as txt as github doesn't support attacheme 20190330_30-45-11-A9-32-70_192.168.0.70_inverter_data.py.csv.txt nt of csv

ebezault commented 5 years ago

It will be difficult for me to reproduce the problem because I'm on Windows, not Linux.

I'll try to answer questions:

Now I wonder where the %R characters are coming from. Are you reading Windows files from Linux? If this is the case I would suggest the following:

You should not use KL_TEXT_INPUT_FILE, which expects to file to be from the same operating system. So from Linux, it will consider %N as an end-of-line, and no discard the %R. To read Windows files from Linux, I would suggest to try to use KL_WINDOWS_INPUT_FILE.read_line which will expect %R%N to be the end-of-line delimiter. Just loop through the file:

create l_last_string.make (l_file.count)
from
     l_file.read_line
until
     l_file.end_of_file
loop
    l_last_string.append (l_file.last_string)
    l_last_string.append_character ('%N')
    l_file.read_line
end
create l_is.make (l_last_string)
phgachoud commented 5 years ago

Thx, Eric,

Yes I work on linux (Debian) and the file is generated from a python script on a raspberry pi (raspbian which is a debian base)

the point for me is that the library should work with both file systems, CRLF & LF, or is there a purpose to separate them?

So for me the trick was redefine this method, but maybe it makes sense to correct the lib but the person who developed it should know why there is this 182 line which seems deliberate

If you want to test it, I attached the csv file

ebezault commented 5 years ago

I guess I'm the one who wrote this class :-) I thought I already answered the question why there is this line 182. And I have cases at work where are have %N in the middle of a cell.

This class, and other classes in Gobo, was designed with the idea that the knowledge about %RN vs. %N vs. other possible conventions on other operating systems should not appear in many classes but be centralized into the KL_*_FILE and KL_*_FILE_SYSTEM classes. That way if we want to support a new operating system which have a different convention for end-of-line, we just have to add a descendant of these classes and that it. No other classes will need to be modified to make other classes of Gobo work with this new convention. So it case of UT_CSV_HANDLER, it's by the type of the object that you pass as argument of read_file that the different end-of-line conventions (even those not known yet today) are supported.

As to why the %R character is not discarded automatically by UT_CSV_HANDLER even when the type of the object passed as argument did not discard it itself, well, I have cases at work where we wanted to keep this %R character and consider it as a character, not as part of the end-of-line. So I'm reluctant to change this design which let's the user decide to keep or not the %R thanks to the type of the object passed as argument.

ebezault commented 5 years ago

If you want to test it, I attached the csv file

As I already mentioned, there is no point in trying it myself: I'm on Windows and the class KL_TEXT_INPUT_FILE on Windows will discard the %R characters, which it does not on Linux, hence my suggestion to use KL_WINDOWS_TEXT_FILE and its read_line feature instead when reading Windows files from Linux.

phgachoud commented 5 years ago

Thx, so I think you we can close this issue if it makes sense for you...