dodona-edu / dolos

:detective: Source code plagiarism detection
https://dolos.ugent.be
MIT License
244 stars 31 forks source link

Export Language and descendants in dolos-lib #1521

Closed ksrnnb closed 2 months ago

ksrnnb commented 3 months ago

Which component(s) is your question about?

Dolos JavaScript library

What is your question?

Why are ProgrammingLanguage, CustomTreeSitterLanguage, and CustomTokenizerLanguage not exported?  https://github.com/dodona-edu/dolos/blob/267203ff506b84f2aac1e7806c56e59c8085d19a/lib/src/index.ts#L4

I want to use the dolos library as shown in the sample code below.

import { File, FingerprintIndex, ProgrammingLanguage } from "@dodona/dolos-lib";

const defaultKgramLength = 23;
const defaultKgramsInWindow = 17;
const index = new FingerprintIndex(defaultKgramLength, defaultKgramsInWindow);

type Runtime = "cpp" | "python";
type Code = {
  code: string;
  path: string;
  runtime: Runtime;
};

const codes: Code[] = [
  {
    code: "xxx",
    path: "foo",
    runtime: "cpp",
  },
  {
    code: "yyy",
    path: "bar",
    runtime: "python",
  },
];

const tokenizedFiles: TokenizedFile[] = [];
for (const code of codes) {
  const language = new ProgrammingLanguage(code.runtime, []);
  const tokenizer = await language.createTokenizer();

  const file = new File(code.path, code.code);
  const tokenizedFile = tokenizer.tokenizeFile(file);

  tokenizedFiles.push(tokenizedFile);
}

index.addFiles(tokenizedFiles);

console.log(index.allPairs());
rien commented 3 months ago

Hi @ksrnnb, there is no reason for it except we did not think about it. I will add an export for these classes in a future release.