Closed corentinllorca closed 4 years ago
I started this task. The dataset I got is quite big (16G) so we need to find a way of taking a subsample.
Issues with the dataset:
I am currently exploring the dataset to get a good feeling about it.
Let's get .c, .cpp, .h and .hpp if there are some! We decided on fetching 20 samples from each problems to get started. Hopefully there won't be too many duplicates and redundancies
Merged and done
See #1