dodona-edu / dolos

:detective: Source code plagiarism detection
https://dolos.ugent.be
MIT License
265 stars 33 forks source link

Comparing sets of files, one per student #1613

Closed bestchai closed 1 month ago

bestchai commented 1 month ago

Which component(s) is your question about? Dolos CLI

What is your question? Each student submits a set of files for their solution. Is it possible to compare sets of files with the CLI?

For example, student A submits a1.java and a2.java. And student B submits b1.java and b2.java.

I don't want to compare a1.java against a2.java or b1.java against b2.java. But I do want to compare every file from student A against every file from student B.

The docs talk about info.csv including a 'label' field that could be used to group files. But, does dolos actually skip computing comparisons for files in the same group? My concern is scalability.

rien commented 1 month ago

To respond to your question: labels are just used for visualization and filtering purposes. Submissions with the same label will still be compared with each other, but it will visually help to distinguish submissions with different labels.

Analyzing multiple files per submission is indeed a feature that we want to support with Dolos. We have an issue with an approach how to tackle it: https://github.com/dodona-edu/dolos/issues/1121. We currently do not have anyone working on this in the near future.

As a workaround you could concatenate all the files of one student together in one big file. That is how we tackle projects where students submit multiple files.

If scalability would still be a problem, you can make the analysis less fine-grained by tweaking the k and w parameters (see https://dolos.ugent.be/docs/running.html#modifying-plagiarism-detection-parameters)

Closing because this is a duplicate, but feel free to continue the discussion here.

bestchai commented 1 month ago

Thank you for the detailed reply, I appreciate it. Indeed #1121 is the right issue.