danhper / suplearn-clone-detection

Cross language clone detection using supervised learning
18 stars 6 forks source link

InvalidArgumentError while training the model #8

Open nagaraj-bahubali opened 3 years ago

nagaraj-bahubali commented 3 years ago

Hi @danhper

I am trying to reproduce the project, and this is what I have done so far 1)Generated java and python vocab using below commands (from bigtool) I have used the ast files given here docker-bigcode bigcode-ast-tools generate-vocabulary -s 10000 --include-types workspace/java-asts.json -o workspace/java-vocab.tsv docker-bigcode bigcode-ast-tools generate-vocabulary -s 10000 --include-types workspace/python-asts.json -o workspace/python-vocab.tsv

2)Modified the config file in super-clone-detection to point to the generated data files 3)Generated the dataset ./bin/suplearn-clone generate-dataset -c config.yml 4) Train the model ./bin/suplearn-clone train -c /path/to/config.yml But after reaching 202 steps out of 4467 in the first epoch, the model throws the error as below:

InvalidArgumentError (see above for traceback): indices[112,180] = 9998 is not in [0, 9994) [[node encoder_java_1/embedding_java_1/GatherV2 (defined at /opt/anaconda3/envs/msr_project/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:1193) ]]

I have attached my config and error log file( line no 121 to 163) config.txt training_error.txt

danhper commented 3 years ago

Hi @nagaraj-bahubali , sorry for the delay and the trouble! Seems to be an issue with the embedding layer but I am slightly unsure why. I will try to look into it.