danhper / suplearn-clone-detection

Cross language clone detection using supervised learning
18 stars 6 forks source link

Not able to run suplearn-clone command #7

Open nagaraj-bahubali opened 3 years ago

nagaraj-bahubali commented 3 years ago

Hello @danhper

I am trying to reproduce the project as part of my MSR course. I am new to this. So, the problem I am facing is, after running the below commands:

pip install -r requirements.txt
python setup.py develop
cp config.yml.example config.yml

I am stuck. When I run the next command : suplearn-clone generate-dataset I get the error as : zsh: command not found: suplearn-clone

Can you please help me?

danhper commented 3 years ago

Hi, it seems I did not include the script the setup.py. Try ./bin/suplearn-clone generate-dataset from the project directory instead.

nagaraj-bahubali commented 3 years ago

Hello @danhper

Thank you so much for such a swift reply 🙏🙏. I am able to run the command now.

But I am facing another issue after running the command.

image

Since I am running the project in a virtual environment, I believe there should not be an issue from my local environment ( Or maybe I could be wrong). Can you please help again?

danhper commented 3 years ago

This repo is already a bit old, so it is very likely because of the keras version used. If you plan on using this, I would suggest spending a bit of time making it compatible with a newer version of keras/tensorflow. If you just want to try to run it, here are the versions that I used for my paper:

I hope that helps.

Note that if you only want the datasets, you can download them directly. There is a link in the README and here: https://daniel.perez.sh/research/2019/cross-language-clones/

nagaraj-bahubali commented 3 years ago

Hi @danhper thanks for sharing the versions. I installed the versions of tensorflow and keras as mentioned above and the compatibility issue is solved.

I am finding it hard to understand what all data has to be made available for the project to run. In other words, do we have to generate all the data files( listed below) mentioned in the config.yml file with the help of bigcode-tools?

  1. vocabulary: $HOME/workspaces/research/results/java/data/no-id.tsv
  2. embeddings: $HOME/workspaces/research/results/python/embeddings/noid-ch1-anc2-nosib-50d-lr001.npy
  3. vocabulary: $HOME/workspaces/research/results/python/vocabulary/no-id.tsv
  4. embeddings: $HOME/workspaces/research/results/python/embeddings/noid-ch1-anc2-nosib-50d-lr001.npy
  5. submissions_path: $HOME/workspaces/research/dataset/atcoder/submissions.json
  6. asts_path: $HOME/workspaces/research/dataset/atcoder/asts/asts.json
  7. db_path: sqlite:///$HOME/workspaces/research/dataset/atcoder/atcoder.db

As of now, I have downloaded suplearn-clone-detection code and java-python-clones.db.gz (and unzipped it). If I have to generate each of the files mentioned above from the bigcode-tools can you please provide the name mappings of the files mentioned above and from the bigcode-tool? Also does atcoder.db in point 7 above refers to java-python-clones.db?

wangwenying1 commented 1 year ago

Hi, it seems I did not include the script the setup.py. Try ./bin/suplearn-clone generate-dataset from the project directory instead. Hello @danhper,I run this command and face this problem.I guess this question that is caused by the compatibility of the version.but I try long time,now,not solution. problem: root@3a24681908a0:/tf/suplearn-clone-detection-master#### sudo ./bin/suplearn-clone generate-dataset Traceback (most recent call last): File "./bin/suplearn-clone", line 3, in from suplearn_clone_detection import cli

ModuleNotFoundError: No module named 'suplearn_clone_detection'