austenpma / csvfix

Automatically exported from code.google.com/p/csvfix
MIT License
0 stars 0 forks source link

eval command crashing on mac #47

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Compile csvfix on MacOS using "make lin" (compiles, no errors)
2. csvfix eval -e '$5/100' -o ./tmp/tmp2.csv ./tmp/tmp1.csv

What is the expected output? What do you see instead?

divide the 5th column by 100. Segmentation Fault. 

What version of the product are you using? On what operating system?
Current version "csvfix_src120.zip" 
uname -a 
Darwin traverse.ib.lan 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun  7 16:33:36 
PDT 2011; root:xnu-1504.15.3~1/RELEASE_I386 i386

Please provide any additional information below.

Program compiled without any errors. 

Tried same command, noted above, on Linux and it worked OK. 

 g++ --version 
i686-apple-darwin10-g++-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3)
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Original issue reported on code.google.com by Daniel.H...@gmail.com on 30 Jan 2012 at 7:40

GoogleCodeExporter commented 8 years ago
Unfortunately, I don't have a Mac, and as you say, I can't reproduce this on 
Linux or Windows. Is it sensitive to the input, or does it seg fault no matter 
what CSV data you feed it?

Original comment by nbutterworth1953@gmail.com on 31 Jan 2012 at 11:27

GoogleCodeExporter commented 8 years ago
Here's a bit more data. This is the CSV I'm trying to work with: 

"foo","foo","45000","2","85.9"

These are the commands I've tried, all create the segfault: 

csvfix eval -e '$5/100' -o ./tmp/tmp2.csv ./tmp/tmp1.csv
csvfix eval -e '.00*$5' -o ./tmp/tmp2.csv ./tmp/tmp1.csv
csvfix eval -e '.00*$4' -o ./tmp/tmp2.csv ./tmp/tmp1.csv
csvfix eval -e '$4/100' -o ./tmp/tmp2.csv ./tmp/tmp1.csv
csvfix eval -e \$4/100 -o ./tmp/tmp2.csv ./tmp/tmp1.csv

I've tried a couple different permutations of the CSV, as well, all fail 
similarly. 
This computation does work, though obviously isn't what I'm looking for: 

csvfix eval -e '5/100' -o ./tmp/tmp2.csv ./tmp/tmp1.csv

Original comment by Daniel.H...@gmail.com on 31 Jan 2012 at 3:03

GoogleCodeExporter commented 8 years ago
whups, that should say: 

csvfix eval -e '.01*$5' -o ./tmp/tmp2.csv ./tmp/tmp1.csv
csvfix eval -e '.01*$4' -o ./tmp/tmp2.csv ./tmp/tmp1.csv

Original comment by Daniel.H...@gmail.com on 31 Jan 2012 at 3:04

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
Which shell are you using on the mac? Could it be something to do with 
different quote handling on that platform? Try putting the expression to be 
evaluated in double-quotes. It looks as if the $5 is being subjected to shell 
variable expansion.

Original comment by nbutterworth1953@gmail.com on 31 Jan 2012 at 5:05

GoogleCodeExporter commented 8 years ago

I've been using bash. /bin/sh is also bash on the Mac. Here's the version:
GNU bash, version 3.2.48(1)-release (x86_64-apple-darwin10.0)
Copyright (C) 2007 Free Software Foundation, Inc.

Tried running the shell script using tcsh, same segfault. 

Original comment by Daniel.H...@gmail.com on 31 Jan 2012 at 5:47

GoogleCodeExporter commented 8 years ago
I still feel it may be variable expansion problem, as I've had this before. If 
you don't mind rebuilding csvfix, can I suggest you try this:

1) Edit the a_expr.cpp file in the alib/src subdirectory, changing this line:

     const char CHAR_VAR = '$'; // variables are preceded by these

to  this:

     const char CHAR_VAR = '@'; // variables are preceded by these

2) Rebuild using "make lin" from the project root.

3) Try using the modified csvfix executable with command lines like this:

      csvfix eval -e '@5/100' -o ./tmp/tmp2.csv ./tmp/tmp1.csv

If that doesn't work, I am a bit stuck for suggestions :-(

Original comment by nbutterworth1953@gmail.com on 31 Jan 2012 at 6:09

GoogleCodeExporter commented 8 years ago
Couple things - first, I did edit the specified file, and did a "make lin" but 
the binary doesn't appear to have rebuilt. Doing an rm csvfix/bin/csvfix fixed 
that. Is a makefile thing, I'm sure. 

Next, changing the variable indicator over to an @ made no change. I got the 
same results as before. One thing this does tell me - it probably eliminates 
the shell as an issue. It is either in the code, something the code is linking 
against, or the compiler. 

Original comment by Daniel.H...@gmail.com on 31 Jan 2012 at 7:00

GoogleCodeExporter commented 8 years ago
OK, something else that occurs to me - the CSV file that you are processing on 
the mac would not happen to be in UTF-8 or similar format? OK, clutching at 
straws here...

Original comment by nbutterworth1953@gmail.com on 31 Jan 2012 at 8:04

GoogleCodeExporter commented 8 years ago
Oh, and can you really make sure that you recompiled the '@' change - do a 
"make clean" from the root (the very top level of the whole csvfix project) and 
then a "make lin".

Original comment by nbutterworth1953@gmail.com on 31 Jan 2012 at 8:25

GoogleCodeExporter commented 8 years ago
Been chasing on this. Have gotten it down to something in the "ExecOp" routine 
in  a_expr.cpp, I think. One thing I do note, you have an else if in there for 
"*" twice. Don't think that is related though. 

The file shouldn't be "utf-8." 

Original comment by Daniel.H...@gmail.com on 31 Jan 2012 at 9:11

GoogleCodeExporter commented 8 years ago
Yes, ExecOp is where a problem would evidence itself, although I suspect the 
problem is that the evaluation stack is not being set up properly. But I will 
have to sleep on it.

BTW, yes I did know about the multiple "*" op problem and it is fixed in the 
source tree. If you have access to Mercurial, you might want to checkout the 
latest version of the code from the repository, rather than using the zip. I 
don't *think* there are any relevant changes/fixes , but I could be wrong.

Original comment by nbutterworth1953@gmail.com on 31 Jan 2012 at 10:00

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
Sleep has not brought any answers :-( However, could you try adding this debug 
code to the a_expr.cpp file. Look for the function called ExprCompiler :: 
Compile and at the very end if it add this:

    DumpRP( std::cout, rep );   //add this
    return "";                  // same as before

this will dump the reverse polish representation of the expression, which will 
look something like this:

    [VAR  [1]][OP   [RV]][NUM  [2]][OP   [+]][OP   [;]]

If you could try your failing code and post the output that may be a help.

You were right about the makefile by the way, I have raised another issue about 
it.

Original comment by nbutterworth1953@gmail.com on 1 Feb 2012 at 11:40

GoogleCodeExporter commented 8 years ago

Original comment by nbutterworth1953@gmail.com on 1 Feb 2012 at 12:43

GoogleCodeExporter commented 8 years ago
Here's what I get when I add the DumpRP: 

[VAR  [5]][OP   [RV]][NUM  [100]][OP   [/]][OP   [;]]

Where is that "RV" coming from? 

Original comment by Daniel.H...@gmail.com on 1 Feb 2012 at 2:54

GoogleCodeExporter commented 8 years ago
The RV is correct - it means retrieve value from the preceding thing on the 
stack, which in this case is the index (5) of a positional parameter. In fact, 
as 5 is showing up as the value for the VAR operation, it finally completely 
knocks on the head the possibility that shell expansion is causing the problem. 
What it doesn't do is point out what the problem is, as the RP is exactly what 
you should get. I will go away and think some more - thanks for running these 
tests, BTW. 

Original comment by nbutterworth1953@gmail.com on 1 Feb 2012 at 3:00

GoogleCodeExporter commented 8 years ago
If you are still up for it, can you change these two functions in a_expr.cpp to 
these instrumented versions - just do a copy and paste:

void Expression :: AddPosParam( const string & s ) {
    std::cout << "AddPosParam: " << s << std::endl;
    mPosParams.push_back( s );
}

string Expression :: GetVar( const string & var ) const {
    std::cout << "GetVar called: " << var << std::endl;
    if ( IsInteger( var ) ) {
        int n = ToInteger( var ) - 1;
        if ( n < 0 ) {
            ATHROW( "Invalid positional parameter " << n );
        }
        if ( n >= (int) mPosParams.size() ) {
            std::cout << "GetVar no pos param: " << "" << std::endl;
            return "";
        }
        else {
            std::cout << "GetVar pos param: " << mPosParams[ n ] << std::endl;
            return mPosParams[ n ];
        }
    }
    else {
        const string * val = mVars.GetPtr( var );
        if ( val == 0 ) {
            ATHROW( "Unknown variable: " << var );
        }
        return * val;
    }
}

And post the output. This should at least narrow down where the problem is.

Original comment by nbutterworth1953@gmail.com on 1 Feb 2012 at 6:23

GoogleCodeExporter commented 8 years ago

Here you go. The "execop2" is a bit of debugging output I had added yesterday. 

[VAR  [5]][OP   [RV]][NUM  [100]][OP   [/]][OP   [;]]
AddPosParam: foo
AddPosParam: foo
AddPosParam: 45000
AddPosParam: 2
AddPosParam: 85.9
ExecOp2 >RV<
./test.sh: line 5: 40999 Segmentation fault      csvfix eval -e '$5 / 100' -o 
./tmp/tmp2.csv ./tmp/tmp1.csv

Original comment by Daniel.H...@gmail.com on 1 Feb 2012 at 8:00

GoogleCodeExporter commented 8 years ago
Could you just confirm the bit of code that your debug statement is embedded 
in? If it is in:

    else if ( op == RDVAR_STR ) {
        string s = PopStr();
        mStack.push( GetVar( s ) );

then I really am flummoxed, as I can't see how PopStr() can fail without 
throwing and GetVar() appears never to be entered. If this really is the 
failure point, then either I'm not seeing something, or things have got 
corrupted much earlier on, which is possible, but not likely. 

Original comment by nbutterworth1953@gmail.com on 1 Feb 2012 at 8:37

GoogleCodeExporter commented 8 years ago
my bit of stuff is at about the 2nd line of the ExecOp function: 

void Expression :: ExecOp( const ExprToken & tok ) {^M
        string op = tok.Value();^M

        std::cerr << "ExecOp2 >";
        std::cerr << op;
        std::cerr <<  "<\n";

....

Original comment by Daniel.H...@gmail.com on 1 Feb 2012 at 9:06

GoogleCodeExporter commented 8 years ago
Hmm. I must admit I am stuck at this point.

Original comment by nbutterworth1953@gmail.com on 2 Feb 2012 at 8:04

GoogleCodeExporter commented 8 years ago
I have attached a patch that fixes this issue.  Tested on Mac OS X 10.6.8.

Clive Hayward
haywardc@chayward.com

Original comment by haywardc...@gmail.com on 8 Feb 2012 at 11:27

Attachments:

GoogleCodeExporter commented 8 years ago
Thanks, but in C++ NULL and zero are exactly the same thing. Unless you can 
come up with an explanation of why this would make a difference, I'm not 
accepting it.

Original comment by nbutterworth1953@gmail.com on 8 Feb 2012 at 11:30

GoogleCodeExporter commented 8 years ago
From 
http://stackoverflow.com/questions/8844719/how-to-set-a-default-value-when-no-ex
tra-arguments-are-present-using-va-list-in

"Note that the terminator must be the same size as the argument that it'll be 
read as with va_arg - eg on a 64 bit machine, using va_arg(args,char *), the 
terminator must be NULL and not 0. – Timothy Jones Jan 13 at 1:33"

See this debug code and resulting output.

bool In( const string & s, CaseSensitive cs, const char * list, ... ) {
        const char * val = list;
        va_list begin;
        va_start( begin, list );

        std::cout << "Null size: " << sizeof(NULL) << std::endl;
        std::cout <<  "Zero size: " << sizeof(0) <<  std::endl;
        while( val != 0) {
                std::cout << val << std::endl;
                if ( Cmp( s, val, cs ) == 0 ) {
                        va_end( begin );
                        return true;
                }
                // All arguments are assumed to be the same 64 bit size.  But 0 is 32 bits whereas NULL is 64 bits.
                val = va_arg( begin, const char * );
        }
        va_end( begin );
        return false;

 >> cat ~/tc.csv
"foo","foo","45000","2","85.9"
>> ./csvfix eval -e '$5/100' ~/tc.csv 
Null size: 8
Zero size: 4
==
<>
<
>
<=
>=
????
????
Segmentation fault

Maybe a better fix would be to use the "bool In( const string & s, const vector 
<string> & list, CaseSensitive cs ) " function instead.

Clive

Original comment by haywardc...@gmail.com on 8 Feb 2012 at 1:16

GoogleCodeExporter commented 8 years ago
Aha - thanks very much! I was beginning to look at 32 versus 64 bit issues 
myself, but was not looking at this specific bit of code. There may be more or 
these lurking around :-( Thanks again.

Original comment by nbutterworth1953@gmail.com on 8 Feb 2012 at 3:37

GoogleCodeExporter commented 8 years ago
Fixed (hopefully)  in source tree. Thanks again to Clive Hayward.

Original comment by nbutterworth1953@gmail.com on 8 Feb 2012 at 5:42