arthurherbout / crypto_code_detection

Automatic Detection of Custom Cryptographic C Code
8 stars 4 forks source link

Codejam dataset #11

Closed arthurherbout closed 4 years ago

arthurherbout commented 4 years ago

Selected 5200 submissions from the code jam dataset ( more than 1 million submissions) They are 725 distinct problems so I decided, with the group's approval, to limit to 20 the number of selected submissions per problem to avoid getting too many duplicates.

That way I was able to select 5200 code samples located in /data/code-jam/files/. Once that step was done, I created the json file as mentioned by Redouane. Each file_name has the prefix code-jam to make it easy to find the origin of the file.