JetBrains-Research / astminer

A library for mining of path-based representations of code (and more)
MIT License
282 stars 80 forks source link

Output format code2vec #215

Open adkonr opened 2 years ago

adkonr commented 2 years ago

Hello everyone,

I am trying to parse Cpp files to the Code2Vec format for further processing. But when I am running the cli file with the following config, the output is saved as .c2s instead of the desired .c2v format. Is this an error? If no, how do I get the code2vec format?

Thanks for your help!

inputDir: dataset/input/ outputDir: output

parser: name: fuzzy languages: [cpp]

label: name: file name

storage: name: code2vec maxPathLength: 1000 maxPathWidth: 1000

zunairazaman2021 commented 1 year ago

@adkonr could you solve this issue? I am facing the same problem

zunairazaman2021 commented 1 year ago

@vovak I am using this yaml file to extract JS code into code2vec format. But, it still gives me in code2Seq format. Can you help me here

`# input directory (path to project) inputDir: /Users/zunaira/Desktop/JScode2vec/testerinput

output directory

outputDir: /Users/zunaira/Desktop/JScode2vec/res3

parse Java & JavaScript files with ANTLR parser

parser: name: antlr languages: [js]

filters:

use file names as labels

this selects the file level granularity

label: name: file name

save to disk in the Code2Vec format

storage: name: code2vec maxPathLength: 8 maxPathWidth: 2 maxPaths: 1000000 maxTokens: 100000 maxPathContextsPerEntity: 5

number of threads used for parsing

the default is one thread

numOfThreads: 4 `

Screenshot 2023-02-20 at 14 34 44