github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.61k stars 1.52k forks source link

can codeql analyse java rt.jar source code? #4304

Open leveryd opened 4 years ago

leveryd commented 4 years ago

i want to analyse part of jdk source code,so i tried to unzip jdk src.zip,it contains java.io.*,java.lang.*

then i create pom.xml and run codeql database create jdk_src --language=java --command="mvn clean package".

database-create-20200630.203801.550.log indicate codeql does not find any source code.

[2020-06-30 20:38:50] [ERROR] database finalize> No source code was seen during the build.
                              This can occur if the specified build commands failed to compile or process any code.
                               - Confirm that there is some source code for the specified language in the project.
                               - For codebases written in Go, JavaScript, TypeScript, and Python, do not specify 
                                 an explicit --command.
                               - For other languages, the --command must specify a "clean" build which compiles 
                                 all the source code files without reusing existing build artefacts.
p0 commented 4 years ago

There isn't anything special about those types, so if you process them from source they should be analyzed like other code. Are you confident that the Maven configuration file you created is compiling the sources? (Could you share the maven logs here?)

aibaars commented 4 years ago

The src.zip does not contain a pom.xml file to build the code. Did you write one yourself, it so, could you share a copy? Note that the pom.xml file must be written in such a way that it compiles all the code you'd like to analyze.

aibaars commented 4 years ago

You may also try a build command like find . -name '*.java' | xargs javac

aibaars commented 4 years ago

Also you are likely to run into errors due to package and module restrictions. It might be worth looking into how the build scripts of the JDK deal with bootstrapping issues. See also https://github.com/openjdk/jdk

leveryd commented 4 years ago

There isn't anything special about those types, so if you process them from source they should be analyzed like other code. Are you confident that the Maven configuration file you created is compiling the sources? (Could you share the maven logs here?)

maybe maven does not compile the sources.

finally, i use command like find . -name '*.java' | xargs javac to compile it, and fail.

as aibbars said,i run into errors due to package and module restrictions,and stuck in.

aibaars commented 4 years ago

@leveryd Something like this should work (haven't tried running it though)

# checkout
git clone https://github.com/AdoptOpenJDK/openjdk-jdk14u

# install dependencies
sudo apt install libx11-dev libxext-dev libxrender-dev libxrandr-dev libxtst-dev \
                 libxt-dev libcups2-dev libfontconfig1-dev libasound2-dev autoconf 

# setup JDKs
JAVA_HOME_13=/usr/lib/jvm/java-13-openjdk-amd64
JAVA_HOME_14=/usr/lib/jvm/java-14-openjdk-amd64

# run build
bash configure --disable-javac-server --disable-warnings-as-errors --disable-cds-archive --with-num-cores=4 \
       --with-boot-jdk="${JAVA_HOME_13}" \
       --with-build-jdk="${JAVA_HOME_14}"
make images

To make the CodeQL database you need wrap the last command with codeql like this:

codeql database create jdk_src --language=java --command="make images"

and I'd recommend calling make clean (or similar) before that to clear up any intermediate results from previous attempts.

xsser commented 4 years ago

so..can codeql analyse java .jar or class file source code?

hmakholm commented 4 years ago

so..can codeql analyse java .jar or class file source code?

The general answer is: Yes, CodeQL can be used to analyze the Java source code for the JRE.

Like for any CodeQL analysis you need to have a build system in place that actually builds the runtime .jar from the source code you want to analyze. Most of this content in this thread are attempts to help the original poster with that part.

V-E-O commented 4 years ago

In my case doing data flow taint tracking, the tracking is lost when the data flows in&out a function inside any dependency jar. Does CodeQL extractor can support directly process compiled class files?

Or, I know that CodeQL is analyzing the output after Java project built. ~/codeql/java/tools/linux/jdk-extractor-java/bin/java -Dfile.encoding=UTF-8 -Xmx1024M -Xms256M -classpath ~/codeql/java/tools/semmle-extractor-java.jar com.semmle.extractor.java.JavaExtractor --jdk-version -1 --javac-args @@@~/db/log/ext/javac.args

Can we provide an option to use "-sourcepath" of log/ext/javac.args with resolved source jar, instead of "-classpath" with compiled jar? Source jar can be resolved by "mvn dependency:sources".

hmakholm commented 4 years ago

Moving to github/codeql since it seems to be language specific. @github/codeql-java should probably be the ones to decide what further to do with the issue.