OSUSecLab / SymLM

Implementation of CCS'2022 paper "SymLM: Predicting Function Names in Stripped Binaries via Context-Sensitive Execution-Aware Code Embeddings"
MIT License
51 stars 5 forks source link

Question about X86-O1 dataset #13

Closed Xiaolinger-Z closed 11 months ago

Xiaolinger-Z commented 1 year ago

Thank you very much for making your dataset public. However, when I reproduced the experiment, I found that many binary files such as the coreutils project were missing from the X86-O1 data set. In addition, there are many binary files that do not conform to the ELF file format, such as files in curl and binutils projects. Hope you can solve the related problems. Thank you again.

xinjin95 commented 11 months ago

Hi @Xiaolinger-Z,

Thanks for your interest on our dataset. To identify the ELF files, you can use the following python script:

>>> import magic
>>> file_path = "/home/xin/Documents/projects/function_summarization/function_summarization/gnu_projects/elf_binaries/x86_64/O0/adns-1.6.0/addrtext_s"
>>> ret = magic.from_file(file_path)
>>> ret
'ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=43fed94e73321997b3de3ea6227eef60e1467d09, with debug_info, not stripped'
>>> if ret.startswith('ELF'):
...     print("ELF file:", file_path)
ELF file: /home/xin/Documents/projects/function_summarization/function_summarization/gnu_projects/elf_binaries/x86_64/O0/adns-1.6.0/addrtext_s
Xiaolinger-Z commented 11 months ago

Hi @xinjin95

Thank you for your response. Additionally, I found that some binaries are missing in your x86-O1 dataset, making the binaries in the x86-O1 dataset unable to correspond to binaries at other optimization levels. I wonder if you can update the binaries of the x86-O1 dataset?

xinjin95 commented 11 months ago

Could you list the missing binaries and their corresponding projects?

Xiaolinger-Z commented 11 months ago

'libgmp10/libgmpxx.so.4.5.2', 'libgmp10/libgmp.la', 'libgmp10/libgmpxx.la', 'libgmp10/i386-linux-gnu/libgmp.so.10', 'libgmp10/i386-linux-gnu/libgmp.so.10.3.2', 'libgmp10/libgmp.so.10.3.2', 'findutils/locate.findutils', 'findutils/xargs', 'findutils/find', 'findutils/locate', 'findutils/updatedb.findutils', 'findutils/updatedb', 'coreutils/ls', 'coreutils/dirname', 'coreutils/runcon', 'coreutils/readlink', 'coreutils/chmod', 'coreutils/od', 'coreutils/csplit', 'coreutils/id', 'coreutils/chcon', 'coreutils/groups', 'coreutils/mv', 'coreutils/paste', 'coreutils/df', 'coreutils/realpath', 'coreutils/false', 'coreutils/head', 'coreutils/dircolors', 'coreutils/md5sum', 'coreutils/stty', 'coreutils/tr', 'coreutils/sum', 'coreutils/mknod', 'coreutils/numfmt', 'coreutils/unlink', 'coreutils/b2sum', 'coreutils/fmt', 'coreutils/cut', 'coreutils/ptx', 'coreutils/tail', 'coreutils/true', 'coreutils/date', 'coreutils/basename', 'coreutils/rmdir', 'coreutils/cp', 'coreutils/uname', 'coreutils/sleep', 'coreutils/fold', 'coreutils/[', 'coreutils/yes', 'coreutils/expand', 'coreutils/shuf', 'coreutils/factor', 'coreutils/mkfifo', 'coreutils/mkdir', 'coreutils/tty', 'coreutils/cksum', 'coreutils/stdbuf', 'coreutils/pathchk', 'coreutils/seq', 'coreutils/sha512sum', 'coreutils/timeout', 'coreutils/shred', 'coreutils/split', 'coreutils/users', 'coreutils/install', 'coreutils/tsort', 'coreutils/sha1sum', 'coreutils/tac', 'coreutils/uniq', 'coreutils/mktemp', 'coreutils/nice', 'coreutils/nproc', 'coreutils/arch', 'coreutils/stat', 'coreutils/env', 'coreutils/sort', 'coreutils/sha256sum', 'coreutils/nohup', 'coreutils/who', 'coreutils/join', 'coreutils/cat', 'coreutils/comm', 'coreutils/du', 'coreutils/pinky', 'coreutils/hostid', 'coreutils/base64', 'coreutils/link', 'coreutils/sha384sum', 'coreutils/sha224sum', 'coreutils/printf', 'coreutils/nl', 'coreutils/sync', 'coreutils/vdir', 'coreutils/expr', 'coreutils/dir', 'coreutils/echo', 'coreutils/touch', 'coreutils/pwd', 'coreutils/chgrp', 'coreutils/base32', 'coreutils/logname', 'coreutils/unexpand', 'coreutils/pr', 'coreutils/printenv', 'coreutils/rm', 'coreutils/chown', 'coreutils/truncate', 'coreutils/ln', 'coreutils/tee', 'coreutils/test', 'coreutils/whoami', 'coreutils/md5sum.textutils', 'coreutils/dd', 'coreutils/wc', 'bison/bison.yacc', 'bison/bison', 'gzip/zfgrep', 'gzip/zmore', 'gzip/zforce', 'gzip/zdiff', 'gzip/gunzip', 'gzip/zcat', 'gzip/gzexe', 'gzip/zcmp', 'gzip/zless', 'gzip/zgrep', 'gzip/config.status', 'gzip/gzip', 'gzip/znew', 'gzip/zegrep', 'gzip/configure.lineno', 'putty/psftp', 'putty/pageant', 'putty/pterm', 'putty/puttytel', 'putty/pscp', 'putty/plink', 'putty/putty', 'putty/puttygen'

xinjin95 commented 11 months ago

Hi @Xiaolinger-Z,

I checked these files from my end. I was able to find most of them and shared the binaries in Google Drive: https://drive.google.com/drive/folders/18u25aryVAEJiw3ieFTOSB6c4_Ok0QgL3?usp=sharing.

Xiaolinger-Z commented 11 months ago

Thanks for sharing.