Mondego / SourcererCC

Sourcerer's Code Clone project
GNU General Public License v3.0
206 stars 69 forks source link

Where can I get the clone pairs's detail? #18

Closed huashijiacuo closed 6 years ago

huashijiacuo commented 6 years ago

I have got the report.csv, blocksclones_index_WITH_FILTER.txt,tokensclones_index_WITH_FILTER.txt.

And the report.csv like this: index_time globalTokenPositionCreationTime num_candidates num_clonePairs total_run_time searchTime timeSpentInSearchingCandidates timeSpentInProcessResult operation sortTime_during_indexing
691 194 0 0 913 0 0 0 index 1
0 34 0 1 2183 2043 0 0 search  
812 166 0 0 987 0 0 0 index 1
995 315 0 0 1318 0 0 0 index 34
0 51 0 495 3240 3066 0 0 search  
0 154 0 495 3456 2914 0 0 search  
0 63 0 495 2941 2776 0 0 search  
0 50 0 495 2975 2825 0 0 search  

The num_clonePairs is 495.So, where is detail? The tokensclones_index_WITH_FILTER.txt is empty, the blocksclones_index_WITH_FILTER.txt like this: 1453,1457 1457,1458 1453,1458 1464,1465 1468,1469 1471,1472 1456,1457 1479,1480 1486,1487 1490,1491 1488,1490 1488,1491 1488,1492 1488,1493 1488,1494 1488,1495 1489,1490 1491,1492 1491,1493 1491,1494 1491,1495 1493,1494 1493,1495 1492,1493 1492,1494 1492,1495 1490,1492 1494,1495 1502,1508 1505,1506 1506,1507 1503,1509 1524,1525 1523,1524 1523,1525 1540,1541 1545,1546 1545,1547 1545,1548 1546,1547 1546,1548 1547,1548

pedromartins4 commented 6 years ago

Could you make clear where did you get those files and what are you looking for?

huashijiacuo commented 6 years ago

well. Thanks for your reply. I am looking for the file of clone-pairs. Which one is it? I do as the README.md said.

  1. I put a project in the file folder "input/path/src/*".
  2. Then executed "java -jar InputBuilderClassic.jar ..." as the README.md said . I got the tokens.file.
  3. I copied the tokens.file to the folder "input/dataset/", then executed "java -jar dist/indexbased.SearchManager.jar index 8".
  4. I copied the tokens.file in folder "input/dataset/" to the folder "input/query/" 5.execute the following command "java -jar dist/indexbased.SearchManager.jar search 8"

Here is the output in terminal: Query File: /home/shi/Downloads/SourcererCC/clone-detector/input/dataset/blocks.file shutting down QBQ, 1511797193424 shutting down QCQ, 1511797193424 shutting down VCQ, 1511797193424 shutting down RCQ, 1511797193425 Total run Time: 00h:00m:02s number of clone pairs detected: 495

Finally, I get some folders,such as "output", "output8.0", "gtpm". And there are files("report.csv", "blocksclones_index_WITH_FILTER.txt","tokensclones_index_WITH_FILTER.txt") in "output8.0".

Am I right? I think the tokens.file is same. I can definitely get the clone pairs . So which file is the result of clone-pairs?

emm, Sir,could you give your email to me? I have read the paper. I'm very attracted to it.And I want do some experiment by the sourcererCC. Thank you very much.

crista commented 6 years ago

The blocksclones_index_WITH_FILTER.txt is the file with clone pairs. The numbers are block numbers. The input file to SCC maps block numbers to blocks, so you can refer back to the input.

huashijiacuo commented 6 years ago

The number of clone pairs in The blocksclones_index_WITH_FILTER.txt is different to the output in terminal. I used the source of apache-log4j-2.9.1 to general a tokens.file.The tokens.file in folder input/dataset/ and input/query/ are the same. The output in terminal is like this: Query File: /home/shi/Downloads/SourcererCC/clone-detector/input/dataset/blocks.file shutting down QBQ, 1511869134424 shutting down QCQ, 1511869134424 shutting down VCQ, 1511869134425 shutting down RCQ, 1511869134425 Total run Time: 00h:00m:03s number of clone pairs detected: 1516

But there is only 3 clone pairs in the file(blocksclones_index_WITH_FILTER.txt).

When I used my own code(just 4 file of .java) as a input to generate tokens.file, the terminal's output is 15, and the number of clone pairs in blocksclones_index_WITH_FILTER.txt is also 15.

I do not know what's wrong with it.I am confused.

crista commented 6 years ago

How many nodes (processes) did you run SCC with? I.e. what argument did you give to init.sh? Was it 8? Note that the clone pairs are found under each NODE folder; each process works on a separate part of the input independently. The total list of clone pairs is just the collection of all blocksclones_index_WITH_FILTER.txt files under NODEs.

huashijiacuo commented 6 years ago

I followed this README.md(https://github.com/Mondego/SourcererCC/blob/master/README.md). And everything is default as the README.md wrote. I didn't use the init.sh. I knew it was in folder of "SourcererCC/clone-detector". Now, I am trying to do exeriment as the README.md(https://github.com/Mondego/SourcererCC/blob/master/clone-detector/README.md) in clone-detetor described. Now, it is 00:00. I will continue to do it tommorow. Thanks.

pedromartins4 commented 6 years ago

Hi @huashijiacuo. We updated the README with fresh instructions that should help understand this better. With the new README, and inactivity here, I am closing this issue. If you have any problems/questions please open a new one.