SoftSec-KAIST / BinKit

Binary Code Similarity Analysis (BCSA) Benchmark
MIT License
134 stars 24 forks source link

Dataset size #13

Open khanwa opened 12 months ago

khanwa commented 12 months ago

Hello, You have mentioned that BinKit 2.0 has 371,928 binaries, however, the Zip file download from the drive contains ~213K files. Could you please clarify?

Thank you

topcue commented 12 months ago

Hello @khanwa.

The size of BinKit 2.0 dataset is 10G. After checking again, there seems to be no problem with the BinKit 2.0 dataset link in README.

Could you check to see if there was an interruption while downloading the BinKit dataset? Thank you.

Rroscha commented 11 months ago

Hello, I have the same problem that BinKit 2.0 has only 213K binary files. And there are only 50 (not 51) projects.

Thank you.

topcue commented 11 months ago

Hello. We will check again and respond as quickly as possible.

Thank you

Rroscha commented 11 months ago

Hello. We will check again and respond as quickly as possible.

Thank you

Thank you very much.

topcue commented 11 months ago

BinKit 1.0 provided precompiled Normal(O0, O1, O2, O3), SizeOpt(Os), Noinline, PIE, LTO, and Obfus datasets. However, BinKit 2.0 only provides precompiled extended compiler versions and optimization level options (O0, O1, O2, O3, Os, Ofast). (Noinline, PIE or NOPIE, LTO, Obfus dataset can be built directly using a script. but does not provide precompile dataset).

README's '371K binary files' is the total of distinct binaries from BinKit 2.0's optimization level options (O0-O3, Os, Ofast) and BinKit 1.0's Noinline, PIE, LTO, and Obfus dataset.

To reduce this confusion, we plan to provide additional precompiled datasets for options such as NOPIE and LTO for the expanded compiler versions of BinKit 2.0.

Thank you

khanwa commented 11 months ago

Thank you very much.