baoliay2008 / lccn_predictor

LeetCode Contest Rating Prediction
https://lccn.lbao.site
MIT License
672 stars 24 forks source link

Plag checker #29

Open pratt0007 opened 1 year ago

pratt0007 commented 1 year ago

Every code that has been written by a user, together with the submission time, is stored in our database. If this website provides a feature that allows us to determine how many times a specific code has been sent, or if that number surpasses a predetermined threshold, we may label it red to indicate that the code has been copied.

We can simply use Plagiarism detection techniques-

  1. Text-Based Plagiarism Detection - Some popular tools include Turnitin, Copyscape, and MOSS (Measure Of Software Similarity).
  2. Code Similarity Algorithms - Libraries like Simian and JPlag are examples.

ML Integration : Train machine learning models to identify code plagiarism. You can use techniques like natural language processing (NLP) and deep learning to analyze and compare code submissions.

baoliay2008 commented 1 year ago

Hi, @pratt0007 Thank you very much for your suggestion. I genuinely appreciate your input.

I did consider this feature, but at this time, we don't have the capability to implement it as effectively as the LeetCode platform. For now, we don't save users' submission code; we only have their datetime, and the LeetCode platform has much more informative data, such as users' IP addresses (many cheating incidents happened in the same school or with someone using two accounts).

pratt0007 commented 1 year ago

Thank You so much @baoliay2008 going through my suggestions. I think we can scrape data from Leetcode by making a package in Python or something like that, and then, after having the database, we can apply some specific algorithms to check code plagiarism.

Kaushik-sss commented 1 year ago

@pratt0007 Aside form your solution to check plagiarism. I can think of one other way to check plagiarism you can scrap the user's submission then remove all spaces, comments, unused variables and functions(sounds complex as it is complex),remove print statements and other statements that don't contribute to solving the problem

use this to get the hash value using python's hash() use a dictionary and then check if already present or newly been found. The main problems that you will come across is identification of keywords in a particular language. Will be happy to work on it. Problems arise when a particular question has only a restricted method of solving it.