Closed Xiaolinger-Z closed 11 months ago
Hi @Xiaolinger-Z,
Thanks for your interest on our dataset. To identify the ELF files, you can use the following python script:
>>> import magic
>>> file_path = "/home/xin/Documents/projects/function_summarization/function_summarization/gnu_projects/elf_binaries/x86_64/O0/adns-1.6.0/addrtext_s"
>>> ret = magic.from_file(file_path)
>>> ret
'ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=43fed94e73321997b3de3ea6227eef60e1467d09, with debug_info, not stripped'
>>> if ret.startswith('ELF'):
... print("ELF file:", file_path)
ELF file: /home/xin/Documents/projects/function_summarization/function_summarization/gnu_projects/elf_binaries/x86_64/O0/adns-1.6.0/addrtext_s
Hi @xinjin95
Thank you for your response. Additionally, I found that some binaries are missing in your x86-O1 dataset, making the binaries in the x86-O1 dataset unable to correspond to binaries at other optimization levels. I wonder if you can update the binaries of the x86-O1 dataset?
Could you list the missing binaries and their corresponding projects?
'libgmp10/libgmpxx.so.4.5.2', 'libgmp10/libgmp.la', 'libgmp10/libgmpxx.la', 'libgmp10/i386-linux-gnu/libgmp.so.10', 'libgmp10/i386-linux-gnu/libgmp.so.10.3.2', 'libgmp10/libgmp.so.10.3.2', 'findutils/locate.findutils', 'findutils/xargs', 'findutils/find', 'findutils/locate', 'findutils/updatedb.findutils', 'findutils/updatedb', 'coreutils/ls', 'coreutils/dirname', 'coreutils/runcon', 'coreutils/readlink', 'coreutils/chmod', 'coreutils/od', 'coreutils/csplit', 'coreutils/id', 'coreutils/chcon', 'coreutils/groups', 'coreutils/mv', 'coreutils/paste', 'coreutils/df', 'coreutils/realpath', 'coreutils/false', 'coreutils/head', 'coreutils/dircolors', 'coreutils/md5sum', 'coreutils/stty', 'coreutils/tr', 'coreutils/sum', 'coreutils/mknod', 'coreutils/numfmt', 'coreutils/unlink', 'coreutils/b2sum', 'coreutils/fmt', 'coreutils/cut', 'coreutils/ptx', 'coreutils/tail', 'coreutils/true', 'coreutils/date', 'coreutils/basename', 'coreutils/rmdir', 'coreutils/cp', 'coreutils/uname', 'coreutils/sleep', 'coreutils/fold', 'coreutils/[', 'coreutils/yes', 'coreutils/expand', 'coreutils/shuf', 'coreutils/factor', 'coreutils/mkfifo', 'coreutils/mkdir', 'coreutils/tty', 'coreutils/cksum', 'coreutils/stdbuf', 'coreutils/pathchk', 'coreutils/seq', 'coreutils/sha512sum', 'coreutils/timeout', 'coreutils/shred', 'coreutils/split', 'coreutils/users', 'coreutils/install', 'coreutils/tsort', 'coreutils/sha1sum', 'coreutils/tac', 'coreutils/uniq', 'coreutils/mktemp', 'coreutils/nice', 'coreutils/nproc', 'coreutils/arch', 'coreutils/stat', 'coreutils/env', 'coreutils/sort', 'coreutils/sha256sum', 'coreutils/nohup', 'coreutils/who', 'coreutils/join', 'coreutils/cat', 'coreutils/comm', 'coreutils/du', 'coreutils/pinky', 'coreutils/hostid', 'coreutils/base64', 'coreutils/link', 'coreutils/sha384sum', 'coreutils/sha224sum', 'coreutils/printf', 'coreutils/nl', 'coreutils/sync', 'coreutils/vdir', 'coreutils/expr', 'coreutils/dir', 'coreutils/echo', 'coreutils/touch', 'coreutils/pwd', 'coreutils/chgrp', 'coreutils/base32', 'coreutils/logname', 'coreutils/unexpand', 'coreutils/pr', 'coreutils/printenv', 'coreutils/rm', 'coreutils/chown', 'coreutils/truncate', 'coreutils/ln', 'coreutils/tee', 'coreutils/test', 'coreutils/whoami', 'coreutils/md5sum.textutils', 'coreutils/dd', 'coreutils/wc', 'bison/bison.yacc', 'bison/bison', 'gzip/zfgrep', 'gzip/zmore', 'gzip/zforce', 'gzip/zdiff', 'gzip/gunzip', 'gzip/zcat', 'gzip/gzexe', 'gzip/zcmp', 'gzip/zless', 'gzip/zgrep', 'gzip/config.status', 'gzip/gzip', 'gzip/znew', 'gzip/zegrep', 'gzip/configure.lineno', 'putty/psftp', 'putty/pageant', 'putty/pterm', 'putty/puttytel', 'putty/pscp', 'putty/plink', 'putty/putty', 'putty/puttygen'
Hi @Xiaolinger-Z,
I checked these files from my end. I was able to find most of them and shared the binaries in Google Drive: https://drive.google.com/drive/folders/18u25aryVAEJiw3ieFTOSB6c4_Ok0QgL3?usp=sharing.
Thanks for sharing.
Thank you very much for making your dataset public. However, when I reproduced the experiment, I found that many binary files such as the coreutils project were missing from the X86-O1 data set. In addition, there are many binary files that do not conform to the ELF file format, such as files in curl and binutils projects. Hope you can solve the related problems. Thank you again.