Selected 5200 submissions from the code jam dataset ( more than 1 million submissions)
They are 725 distinct problems so I decided, with the group's approval, to limit to 20 the number of selected submissions per problem to avoid getting too many duplicates.
That way I was able to select 5200 code samples located in /data/code-jam/files/.
Once that step was done, I created the json file as mentioned by Redouane.
Each file_name has the prefix code-jam to make it easy to find the origin of the file.
Selected 5200 submissions from the code jam dataset ( more than 1 million submissions) They are 725 distinct problems so I decided, with the group's approval, to limit to 20 the number of selected submissions per problem to avoid getting too many duplicates.
That way I was able to select 5200 code samples located in /data/code-jam/files/. Once that step was done, I created the json file as mentioned by Redouane. Each file_name has the prefix code-jam to make it easy to find the origin of the file.