Collected solutions from Google Code Jam dataset programming competition, years 2008-2021.
Some files are missing due to special characters and encodings (mainly some of chinese contestants).
Years 2018 to 2021 have slightly different file names (in csvs) because Google changed
contest pages structure.
See also: https://github.com/Jur1cek/codeforces-dataset
BibTeX entry (consider citing):
@inproceedings{10.1145/3472410.3472445, author = {Petrik, Juraj and Chuda, Daniela}, title = {The effect of time drift in source code authorship attribution: Time drifting in source code - stylochronometry}, year = {2021}, isbn = {9781450389822}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3472410.3472445}, doi = {10.1145/3472410.3472445}, abstract = {Stylochronometry deals with the influence of time in an author's style, specifically how it changes stylometric features. Analysis of time drift occurrence is important especially for a dataset creation process of other works in this area. In this paper, we performed experiments using the Google Code Jam dataset to show the influence of time drift in the area of source code authorship attribution. Our experiments revealed that there is significant time drift in stylometric features in one year difference, which is enlargening as the difference of time increases. Another interesting result is that when training our authorship attribution method on data from the future and testing on data from the past, the time drift is lower than in opposite direction. Also, we found the relation between the length of source code and the accuracy of our authorship attribution method.}, booktitle = {Proceedings of the 22nd International Conference on Computer Systems and Technologies}, pages = {87–92}, numpages = {6}, keywords = {authorship attribution, google code jam, source code, stylochronometry, stylometry, time drift}, location = {Ruse, Bulgaria}, series = {CompSysTech '21} }
Thank you!