Open pratt0007 opened 1 year ago
Hi, @pratt0007 Thank you very much for your suggestion. I genuinely appreciate your input.
I did consider this feature, but at this time, we don't have the capability to implement it as effectively as the LeetCode platform. For now, we don't save users' submission code; we only have their datetime, and the LeetCode platform has much more informative data, such as users' IP addresses (many cheating incidents happened in the same school or with someone using two accounts).
Thank You so much @baoliay2008 going through my suggestions. I think we can scrape data from Leetcode by making a package in Python or something like that, and then, after having the database, we can apply some specific algorithms to check code plagiarism.
@pratt0007 Aside form your solution to check plagiarism. I can think of one other way to check plagiarism you can scrap the user's submission then remove all spaces, comments, unused variables and functions(sounds complex as it is complex),remove print statements and other statements that don't contribute to solving the problem
use this to get the hash value using python's hash() use a dictionary and then check if already present or newly been found. The main problems that you will come across is identification of keywords in a particular language. Will be happy to work on it. Problems arise when a particular question has only a restricted method of solving it.
Every code that has been written by a user, together with the submission time, is stored in our database. If this website provides a feature that allows us to determine how many times a specific code has been sent, or if that number surpasses a predetermined threshold, we may label it red to indicate that the code has been copied.
We can simply use Plagiarism detection techniques-
ML Integration : Train machine learning models to identify code plagiarism. You can use techniques like natural language processing (NLP) and deep learning to analyze and compare code submissions.