bblfsh / cpp-driver

C++ AST extractor / driver for the bblfsh project
GNU General Public License v3.0
11 stars 12 forks source link

Error parsing `setproctitle.c` from Redis #27

Closed jjhenkel closed 5 years ago

jjhenkel commented 5 years ago

When trying to parse ./src/setproctitle.c from Redis I get the following error:

Error: could not stream: rpc error: code = Unknown desc = could not parse: driver failure: tech.sourced.babelfish.DriverResponse.ResponseSendException
Error serializing the AST to JSON: com.fasterxml.jackson.databind.JsonMappingException: (was java.lang.NullPointerException) (through reference chain: tech.sourced.babelfish.DriverResponse["ast"])
tech.sourced.babelfish.DriverResponse$ResponseSendException: com.fasterxml.jackson.databind.JsonMappingException: (was java.lang.NullPointerException) (through reference chain: tech.sourced.babelfish.DriverResponse["ast"])
    at tech.sourced.babelfish.DriverResponse.send(DriverResponse.java:49)
    at tech.sourced.babelfish.Main.process(Main.java:53)
    at tech.sourced.babelfish.Main.main(Main.java:13)
Caused by: com.fasterxml.jackson.databind.JsonMappingException: (was java.lang.NullPointerException) (through reference chain: tech.sourced.babelfish.DriverResponse["ast"])
    at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:388)
    at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:348)
    at com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:343)
    at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:697)
    at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:155)
    at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:292)
    at com.fasterxml.jackson.databind.ObjectMapper.writeValue(ObjectMapper.java:2509)
    at tech.sourced.babelfish.TranslationUnitJSONMapper.writeValue(TranslationUnitJSONMapper.java:40)
    at tech.sourced.babelfish.DriverResponse.send(DriverResponse.java:44)
    ... 2 more
Caused by: java.lang.NullPointerException
    at tech.sourced.babelfish.JsonASTVisitor.serializePreproStatements(JsonASTVisitor.java:1352)
    at tech.sourced.babelfish.JsonASTVisitor.visit(JsonASTVisitor.java:1407)
    at org.eclipse.cdt.internal.core.dom.parser.ASTTranslationUnit.accept(ASTTranslationUnit.java:267)
    at tech.sourced.babelfish.TranslationUnitSerializer.serialize(TranslationUnitSerializer.java:69)
    at tech.sourced.babelfish.TranslationUnitSerializer.serialize(TranslationUnitSerializer.java:44)
    at com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:704)
    at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:689)
    ... 7 more

Command used:

 ~/mount/workspace/srcd parse uast -m native -l cpp ~/redis/src/setproctitle.c

This also happens trying to parse ./src/debug.c and ./src/config.h.

Here are the exact files that caused errors during parsing:

setproctitle.c config.h debug.c

creachadair commented 5 years ago

This looks like a bug in the Java client, rather than the C++ driver (though there could also be a bug in the latter that triggers this).

jjhenkel commented 5 years ago

Probably the latter: just checked and I can replicate using the python client library.

bzz commented 5 years ago

Indeed, this looks suspicious, thank you for detailed report @jjhenkel !

The simplest thing I can think of, in order to check if this has to do with Cpp driver is to start the only the driver and try parsing with bblfsh-cli binary.

docker run --rm -it -p 9432:9432 bblfsh/cpp-driver:v1.2.0 --log-level=debug
wget -O setproctitle.c https://github.com/bblfsh/cpp-driver/files/2916974/setproctitle.c.txt
bblfsh-cli setproctitle.c

And as you mention, this does produce very similar logs (in details)

``` driver failure: tech.sourced.babelfish.DriverResponse.ResponseSendException Error serializing the AST to JSON: com.fasterxml.jackson.databind.JsonMappingException: (was java.lang.NullPointerException) (through reference chain: tech.sourced.babelfish.DriverResponse["ast"]) tech.sourced.babelfish.DriverResponse$ResponseSendException: com.fasterxml.jackson.databind.JsonMappingException: (was java.lang.NullPointerException) (through reference chain: tech.sourced.babelfish.DriverResponse["ast"]) at tech.sourced.babelfish.DriverResponse.send(DriverResponse.java:53) at tech.sourced.babelfish.Main.process(Main.java:53) at tech.sourced.babelfish.Main.main(Main.java:13) Caused by: com.fasterxml.jackson.databind.JsonMappingException: (was java.lang.NullPointerException) (through reference chain: tech.sourced.babelfish.DriverResponse["ast"]) at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:388) at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:348) at com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:343) at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:697) at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:155) at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:292) at com.fasterxml.jackson.databind.ObjectMapper.writeValue(ObjectMapper.java:2509) at tech.sourced.babelfish.TranslationUnitJSONMapper.writeValue(TranslationUnitJSONMapper.java:41) at tech.sourced.babelfish.DriverResponse.send(DriverResponse.java:48) ... 2 more Caused by: java.lang.NullPointerException at tech.sourced.babelfish.JsonASTVisitor.serializePreproStatements(JsonASTVisitor.java:1296) at tech.sourced.babelfish.JsonASTVisitor.lambda$visit$13(JsonASTVisitor.java:1348) at tech.sourced.babelfish.JsonASTVisitor.visitWrapper(JsonASTVisitor.java:154) at tech.sourced.babelfish.JsonASTVisitor.visit(JsonASTVisitor.java:1346) at org.eclipse.cdt.internal.core.dom.parser.ASTTranslationUnit.accept(ASTTranslationUnit.java:267) at tech.sourced.babelfish.TranslationUnitSerializer.serialize(TranslationUnitSerializer.java:69) at tech.sourced.babelfish.TranslationUnitSerializer.serialize(TranslationUnitSerializer.java:44) at com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:704) at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:689) ... 7 more ```

The file seems to be syntactically sound, according to Clang gcc setproctitle.c -fsyntax-only

I'm going to take a closer look and attempt a fix, but realistically if no one steps up before that - will only have time early next week.

Will re-assign back as soon as it's WIP to avoid the confusion.

bzz commented 5 years ago

Seems to be caused by an issue in the native driver while serializing a MacroReference of IASTPreprocessorIfndefStatement node, that according to CDT docs

represent a preprocessor #ifndef statement

According to .getMacroReference() docs it

Returns the macro reference, or null if the macro does not exist.

but the driver does not check for null, which seems to be a root cause of NPE.

If @juanjux does not have other suggestions, I'm going to localize minimal code snippet that triggers this, add it to the driver fixtures for reproducibility and then ~add a null check to~ wrap that case in new String(...).

bzz commented 5 years ago

Interestingly enough, distilling minimum reproducible examples is not that simple!

Right now there are 2 places in the setproctitle.c that hit this: L216 \w #ifndef SPT_MAXTITLE and L70 \w #ifndef SPT_MIN - both get null in the MacroReference, but if any of those ifdefs are just moved to some existing fixture, like preprocessor_ifdef.cpp - it all gets parsed nice and clean, (MacroReference gets set to SPT_MAXTITLE) and it's not reproducible any more :/

And just adding the whole setproctitle.c file would complicate bblfsh org lives, as it's part of the BSD-3-Clause project and all drivers are GPLv3 and so we would need to keep the NOTICE as part of the source code distribution artifacts for drivers that we release which seems like unnecessary legal risk of getting it wrong.

@creachadair as you know most probably know the language much better, would you be able to tell if I'm missing something here and how come that the same

#ifndef SPT_MAXTITLE
#define SPT_MAXTITLE 255
#endif

in two different files, when parsed, produces different AST?

jjhenkel commented 5 years ago

I might be able to delta-debug that example. It'll take a bit to setup a script but once I have that I can send you a reduce file.

bzz commented 5 years ago

@jjhenkel nice idea! Thank you for suggestion, indeed it's a smallest example file so far.

There is a simple way to do that locally:

that would build the native driver inside the container and run all test over the ./fixtures/*.

I also just realized that to speed every iteration out, one can also skip building the container every time and run the tests manually with

docker run --rm -u 501:20 -e BBLFSH_TEST_LOCAL=true -v "${PWD}/fixtures":/opt/fixtures --workdir /opt/driver/bin --entrypoint ./fixtures.test sha256:<latest image sha>
jjhenkel commented 5 years ago

This seems to be a minimal example (now I used a delta-debugging script and a simple grep-based tester so this may be another, unrelated, null-reference error; but my hope is it is the same one):

#if (defined __linux || defined __APPLE__)
#ifndef SPT_MIN
jjhenkel commented 5 years ago

Also, in case it helps in solving issues like this in the future, here was my setup:

  1. In a fresh directory I setup engine v0.11.0 (downloaded the .tar for linux from the GitHub releases page).
  2. I wrote the following small script to parse a C file using engine and check for the error we were seeing:
    
    #!/bin/bash

DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

${DIR}/engine_linux_amd64/srcd \ parse uast \ --mode=native \ --lang=cpp \ "${1}" 2>&1 \ | grep -q 'Error serializing the AST to JSON: com.fasterxml.jackson.databind.JsonMappingException: (was java.lang.NullPointerException)'

exit $?

3. I applied `https://github.com/renatahodovan/picire` which has an implementation of the Delta Debugging algorithm from `A. Zeller: "Yesterday, my program worked", ESEC/FSE 1999`. 
4. I ran the above tool using the following flags (where `setproctitle.c` was our starting input): 
```bash
picire --input=setproctitle.c --test=wrapper.sh 
  1. It reduced the file, iteratively, to a minimal example using the ddmin algorithm. That file was the wonderfully small 2-line example:
    #if (defined __linux || defined __APPLE__)
    #ifndef SPT_MIN

All of that without having to guess and check which lines are causing the issue myself! (That's the real advantage of ddmin.) And picire seems to be a solid implementation.

jjhenkel commented 5 years ago

Whoops---my tester script doesn't ensure the minimized file is syntactically valid. Here's an example with that constraint applied:

#if E
#if
#ifdef
#endif
#endif
#endif
jjhenkel commented 5 years ago

And now here is the updated tester script and command line invocation for future reference:

tester.sh:

#!/bin/bash

DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

${DIR}/engine_linux_amd64/srcd \
  parse uast \
    --mode=native \
    --lang=cpp \
    "${1}" 2>&1 \
  | grep -q 'Error serializing the AST to JSON: com.fasterxml.jackson.databind.JsonMappingException: (was java.lang.NullPointerException)'

RC1=$?

gcc -fsyntax-only "${1}" &> /dev/null

RC2=$?

if [ "${RC1}" -eq "0" ]; then
  if [ "${RC2}" -eq "0" ]; then
    # Interseting! File failed to parse in the correct way 
    # but GCC says it is still snytactically valid
    exit 0
  fi
fi

# Not interesting (either parse worked/has different error/or file is 
# no longer snytactically valid)
exit 1

Invocation:

# --atom=both will reduced on lines and then chars
picire --atom=both --input=setproctitle.c --test=wrapper.sh
bzz commented 5 years ago

Very nice! 👏

Was able to manually boil down to a very similar

#if (defined __linux || defined __APPLE__)

#ifndef SPT_MIN
#define SPT_MIN(a, b) (((a) < (b))? (a) : (b))
#endif

#endif /* __linux || __APPLE__ */

Ok, let me submit a fix

bzz commented 5 years ago

@jjhenkel thank you very much for help!

v1.2.2 is released, is available on docker hub for using and is going to be included in :latest-drives image of the next bblfshd release 🎉