Open rien opened 1 year ago
Adding support for HTML would be great, because the HTML judge allows for exercises with different solutions (e.g. add a title (doesn't matter which text), add at least 3 items to a list, ...).
@BTWS2 there exists a parser for HTML, but it wouldn't work good for plagiarism detection because the parser ignores tag names, attribute names, the exact content itself, ...
This is how tree-sitter converts an example HTML page:
Especially if the underlying structure of the analyzed HTML files is expected to be very similar, using this parser would result in very high similarities. Using this parser, Dolos reports that the homepage of GitHub and the homepage of Dodona have a similarity of 88%.
I would prefer to stick to languages that work good with Dolos. However if you want, you can try it out yourself by installing tree-sitter-html
using npm or yarn, Dolos is able to automatically detect and use this parser if it is available.
@rien No problem, thank you for the insight.
@rien can there support simple text?
Regular assignments (like essays) is the usecase I'm thinking for this. The tokenizer is as simple as splitting by space? Or sentences split by "."
@anilgulecha Dolos is specifically made for plagiarism detection on source code. There are tools that should perform better on just text than Dolos.
That said, we do indeed have a tokenizer that does split on spaces which you can use by passing --language char
. However, in that case you might be better / faster using the diff
command or something else that does string matching.
Thanks for the char recommendation. will try it out.
I need support for the language Modelica. It is not supported by tree-sitter, but there are two parsers on github: https://github.com/OpenModelica/tree-sitter-modelica https://github.com/mtiller/modelica-tree-sitter
I managed to get them running using tree-sitter directly, but I had no luck adding them to dolos yet. Do you have any tips? I am on Linux if thats important.
@yafuerst Dolos will try to find the parser with a fitting name (tree-sitter-${name}
) if you add the language with the -l
option. It will look in the node_modules
accessible to Dolos (local, per user, global).
If you've managed to get them working but if Dolos doesn't work, you can try "installing" the parser fro your user or globally with npm link
or npm link --global
. For the modelica-tree-sitter
parser to be detected by Dolos, you will have to change the name to tree-sitter-modelica
.
Let me know if it doesn't work and we'll figure it out.
what about vue, react?
@alexey-sh since those languages use multiple languages (template syntax, css, html, ...), tree-sitter does not handle those out-of-the box, so some additional work is required for them to work.
In addition, since HTML and CSS often have a lot of common code fragments between submissions, Dolos isn't very good in detecting plagiarism with them (you get a lot of false positives).
However, we do plan on changing Dolos under the hood to support these kind of languages in the future!
Hi, I am running dolos with the following version -
Dolos v2.3.0
Node v18.16.0
Tree-sitter v0.20.1
npm
only has tree-sitter-java@0.19.1
, and it seems dolos
cannot find it because of this. Any workaround? I have tried installing it locally and globally with pnpm
and npm
.
@DhruvDh with the way we currently integrate tree-sitter languages, we will have to wait on tree-sitter-java
to publish a new release. Recently, someone already made an issue with the maintainers of that parser to create a new release, let's hope they publish it soon: https://github.com/tree-sitter/tree-sitter-java/issues/163
As an alternative, you can try cloning this repository and updating the base tree-sitter version manually. However, that van be cumbersome.
We already have some ideas how to avoid this problem with Dolos in the future (see #1028), however we've not started on the implementation of this solution yet.
I was able to solve this by the following, I am not confident I understand it correctly, so I won't attempt an explanation.
# a fork with package.json version set to 0.20.1
pnpm install DhruvDh/tree-sitter-java
pnpm rebuild
pnpm install @dodona/dolos
pnpm exec dolos run info.csv
Would you support Verilog? Thanks.
Hi @nachiket, thanks for your suggestion! There is an official verilog parser available, so this is definitely possible.
We'll put it in our schedule and will let you know when support for verilog has landed.
Would you support Go and Rust? Thanks
@rhz1949 working on that right now. If everything goes well, support for Go and Rust will be added in the new release I will make today.
If you want to use a programming language that Dolos does not support yet, please ask here! This helps us with prioritizing which programming languages we should focus on first.
We currently ship Dolos with the following programming languages:
If your programming language is not in the list of languages supported out-of-the box, there is a high possibility that a tree-sitter parser already exists for that language. If that is the case, it should be easy to add support for your language.
In any case, let us know which languages you want to use with Dolos!