AppThreat / cpggen

Generate CPG for multiple languages for code and threat analysis
https://discord.gg/tmmtjCEHNV
Apache License 2.0
7 stars 0 forks source link

Error generating CPG #51

Open anonymitycoder2 opened 9 months ago

anonymitycoder2 commented 9 months ago

Why does the progress stop at 20% when I generate a CPG? 微信截图_20231203211021

prabhu commented 9 months ago

@anonymitycoder2 joern project has some known bugs and performance issues with c/c++. Can you use our forks atom and chen instead?

https://github.com/AppThreat/atom (comparable to joern cli v2+)

# Download from releases https://github.com/AppThreat/atom/releases - java 21 is better
atom.sh -o app.atom -l c .
# Create an atom with data flows
atom.sh -o app.atom -l c . --with-data-deps

To query the atom you can use chen (fork of joern with enhancements)

https://github.com/AppThreat/chen

importAtom("/path/to/atom file")

Let me know how it goes.

anonymitycoder2 commented 9 months ago

Joern Project 在 C/C++ 方面存在一些已知的错误和性能问题。你能用我们的分叉 atom 和 chen 代替吗?

https://github.com/AppThreat/atom(可与 Joern CLI v2+ 相媲美)

# Download from releases https://github.com/AppThreat/atom/releases - java 21 is better
atom.sh -o app.atom -l c .
# Create an atom with data flows
atom.sh -o app.atom -l c . --with-data-deps

要查询原子,您可以使用 chen (具有增强功能的 joern 的分支)

https://github.com/AppThreat/chen

importAtom("/path/to/atom file")

让我知道它是怎么回事。

Thank you for your reply I seem to have generated some data using cpggen cpggen -i dataset -o cpggen-out image How can I use this data to generate various graph structures and slices for each c file under the dataset file, just like joner did. The file I generated using the relevant command was empty If you can help me solve this problem, I would be immensely grateful.

prabhu commented 9 months ago

atom has dedicated commands for slices generation.

https://github.com/AppThreat/atom/tree/main#create-usages-slice-for-a-java-project

The repotests in atom have invocations for other languages.

anonymitycoder2 commented 9 months ago

atom has dedicated commands for slices generation.

https://github.com/AppThreat/atom/tree/main#create-usages-slice-for-a-java-project

The repotests in atom have invocations for other languages.

atom has dedicated commands for slices generation.

https://github.com/AppThreat/atom/tree/main#create-usages-slice-for-a-java-project

The repotests in atom have invocations for other languages. thank you for your reply! I generated some slice data using the atom you provided,It seems that all c files in the dataset directory are sliced into a json file. image Is this the result of slicing? Can atom generate the corresponding graph structure and slice data for each c file in the dataset directory, just like joern did? Now it seems that all c files are sliced into a json. Looking forward to your reply.

prabhu commented 9 months ago

@anonymitycoder2, yes, slices are a single file for all source code. I didn't know joern was generating one file per source for slicing. If you mean export of CPG and DDG, we're happy to add that command.

anonymitycoder2 commented 9 months ago

@anonymitycoder2, yes, slices are a single file for all source code. I didn't know joern was generating one file per source for slicing. If you mean export of CPG and DDG, we're happy to add that command.

It would be great if you could do this. Thank you for your patient reply. It really helped me a lot.

prabhu commented 9 months ago

@anonymitycoder2 could you kindly review the below PR, which adds export to graphml?

https://github.com/AppThreat/atom/pull/101

atom -o app.atom -l java --export-atom --export-dir <export dir> <path to application>
anonymitycoder2 commented 9 months ago

@anonymitycoder2 could you kindly review the below PR, which adds export to graphml?

AppThreat/atom#101

atom -o app.atom -l java --export-atom --export-dir <export dir> <path to application>

Thank you for your reply,I ran the program according to the PR you provided, but the graphml file was not successfully exported. Some errors occurred,Is it caused by the versions of python and jdk? image There are some other errors reported image image Looking forward to your reply

prabhu commented 9 months ago

@anonymitycoder2, interesting! Please share the full exception trace since I want to know which line is looking for Python, which must remain an optional dependency for non-ml users.

prabhu commented 9 months ago

@anonymitycoder2 Could you retest with the latest from that branch?

anonymitycoder2 commented 9 months ago
atom -o app.atom -l java --export-atom --export-dir <export dir> <path to application>

Thanks for your reply, I generated some graphml files after testing using the latest version of the branch. image However, the number and file names of the generated graphml do not correspond to the java files in the dataset.

Can atom generate a grapgml file corresponding to the file name for each java file? It would be better if atom could specify the type of graph to output, such as ast, cpg, pdg

prabhu commented 9 months ago

@anonymitycoder2, thank you for trying the branch. I have pushed an update to atom:

Regarding support for all individual representations, we do not have any enterprise users with such a request, so it is not a priority yet. We also aim to keep atom lightweight for easy CI/CD use cases. Hope this helps.

anonymitycoder2 commented 9 months ago

@anonymitycoder2, thank you for trying the branch. I have pushed an update to atom:

  • to include the filename
  • add support for dot format
  • not include DDG by default (can be included with --with-data-deps)

Regarding support for all individual representations, we do not have any enterprise users with such a request, so it is not a priority yet. We also aim to keep atom lightweight for easy CI/CD use cases. Hope this helps.

Thanks for your help, it really helped me a lot. If there is a chance in the future, I will introduce atoms in my paper to help promote it. Thanks again from the bottom of my heart

prabhu commented 9 months ago

@anonymitycoder2, you used the magic word paper. Let me find a way to do this without affecting the size.

prabhu commented 8 months ago

@anonymitycoder2 atom 1.8.0 was released with three individual representations exported automatically in dot format. AST, CDG, CFG. Four files would be created per method in total, with the 4th comparable to CPG since it would include all representations, including DDG and PDG. I hope this helps.

https://github.com/AppThreat/atom/releases/tag/v1.8.0

anonymitycoder2 commented 8 months ago

@anonymitycoder2 atom 1.8.0 was released with three individual representations exported automatically in dot format. AST, CDG, CFG. Four files would be created per method in total, with the 4th comparable to CPG since it would include all representations, including DDG and PDG. I hope this helps.

https://github.com/AppThreat/atom/releases/tag/v1.8.0

Thank you so much! This is really helpful for my work,I used the latest version of atom to generate a comprehensive dot file and dot files of ast, cfg, dfg. However, this comprehensive dot file is different from the cpg14 exported by joern. Atom seems to have more additional information about edges and nodes. The cpg14 type code graph structure exported by joern is very popular in the field of code representation learning, and it is what I want to generate. Thank you again for your help, atom is excellent, I will recommend it to my friends who are engaged in related research

prabhu commented 8 months ago

Thanks, @anonymitycoder2, for your kind words! As you figured, we call atom version 2 since we need that additional information to perform type inference and package URL inference.

Below are a couple of screenshots that show these inferences in action. Not only do we know the type, but we even know the precise dependency they must have come from for a few languages.

4 2

I am looking forward to the new generation of research unlocked by atom and chen.