dodona-edu / dolos

:detective: Source code plagiarism detection
https://dolos.ugent.be
MIT License
248 stars 31 forks source link

Implement serializing to duckdb #1578

Open rien opened 1 month ago

rien commented 1 month ago

This PR allows serializing a Dolos analysis to a DuckDB file instead of CSV-files.

Performance Benchmark

Dataset DuckDB write-out DuckDB parse CSV write-out CSV parse
Pyramidal Constants - Exercise 942 ms 4729 ms 622 ms 3202 ms
Pyramidal Constants - Evaluation 481 ms 4654 ms 106 ms 2378 ms
Plutokiller 21 259 ms 11 187 ms 17 557 ms 16 078 ms

Currently only parsing large datasets using DuckDB is faster.