Mondego / SourcererCC

Sourcerer's Code Clone project
GNU General Public License v3.0
206 stars 69 forks source link

Failed to run block-level tokenizer #54

Closed Coppelian closed 2 years ago

Coppelian commented 2 years ago

I set up a python 3.7 environment using conda and followed requirements.txt to set it up. I used the test-env and set config.ini to python. But I failed to run the block-level tokenizer. I kept getting warning:
join() argument must be str or bytes, not 'ZipInfo'

The output looks like this:

GO
'zipblocks'format
*** Starting priority projects...
*** Starting regular projects...
Starting new process 0
*** No more projects to process. Waiting for children to finish...
GO
[INFO] (MainThread) Process 0 starting
[INFO] (MainThread) Starting zip project <11,test-env/2Shirt-SpellBurner.zip> (process 0)
[INFO] (MainThread) Attempting to process_zip_ball test-env/2Shirt-SpellBurner.zip
[INFO] (MainThread) Successfully ran process_zip_ball test-env/2Shirt-SpellBurner.zip
[INFO] (MainThread) Project finished <11,test-env/2Shirt-SpellBurner.zip> (process 0)
[INFO] (MainThread)  (0): Total: 0:00:00.001971micros | Zip: 0 Read: 0 Separators: 0micros Tokens: 0micros Write: 0micros Hash: 0 regex: 0
[INFO] (MainThread) Starting zip project <12,test-env/2xyo-indicator-ip.zip> (process 0)
[INFO] (MainThread) Attempting to process_zip_ball test-env/2xyo-indicator-ip.zip
[INFO] (MainThread) Attempting to process_file_contents test-env/2xyo-indicator-ip.zip\indicator-ip-master/test.py
[WARNING] (MainThread) Unable to open zip on <test-env/2xyo-indicator-ip.zip> (process 0)
[WARNING] (MainThread) join() argument must be str or bytes, not 'ZipInfo'
[INFO] (MainThread) Project finished <12,test-env/2xyo-indicator-ip.zip> (process 0)
[INFO] (MainThread)  (0): Total: 0:00:00.002480micros | Zip: -1 Read: -1 Separators: -1micros Tokens: -1micros Write: -1micros Hash: -1 regex: -1
[INFO] (MainThread) Starting zip project <13,test-env/3demax-Take-a-break.zip> (process 0)
[INFO] (MainThread) Attempting to process_zip_ball test-env/3demax-Take-a-break.zip
[INFO] (MainThread) Attempting to process_file_contents test-env/3demax-Take-a-break.zip\Take-a-break-master/examples/appmenu.py
[WARNING] (MainThread) Unable to open zip on <test-env/3demax-Take-a-break.zip> (process 0)
[WARNING] (MainThread) join() argument must be str or bytes, not 'ZipInfo'
[INFO] (MainThread) Project finished <13,test-env/3demax-Take-a-break.zip> (process 0)
[INFO] (MainThread)  (0): Total: 0:00:00.002480micros | Zip: -1 Read: -1 Separators: -1micros Tokens: -1micros Write: -1micros Hash: -1 regex: -1
[INFO] (MainThread) Process 0 finished. 2 files in 0s.
Process 0 finished, 2 files processed (3000002). Current total: 2
*** All done. 2 files in 0:00:00.145329

I even used diff.txt. I know that's not the solution. Is there someone who have met the same problem like me?

crista commented 2 years ago

If you're using a Mac, you need to downgrade Python to 3.6 because the tokenizer uses multiprocessing features that don't work in Macs past Python 3.6. Better yet: don't use a Mac.

Coppelian commented 2 years ago

@crista Thank you for your response! I'll set up an ubuntu system with python 3.6 to see if it works.

crista commented 2 years ago

@Coppelian if you use Ubuntu, there is no problem -- you can use the latest Python.

Coppelian commented 2 years ago

@crista Hi, Crista. I create a ubuntu 18.04 LTS with python 3.5 in itself. I still faced the same mistake. What should I do in this situation? And what is your environment when running block-level tokenizer?

crista commented 2 years ago

Please use Python 3.8 or later on Ubuntu.

Coppelian commented 2 years ago

I tried it on python3.7 and 3.8 and received the same mistake. I'm using Ubuntu 16.04 LTS with python 3.8 now. You can find my log here. LOG-0.log . I wonder what actually cause this problem since file-level code works perfectly now.

Coppelian commented 2 years ago

Can you provide your environment details of running block-level tokenizer? I'm stuck here and need a way out. Thank you for your help.

crista commented 2 years ago

@Coppelian the block-level tokenizer is not supported anymore.

Coppelian commented 2 years ago

Uh. Thank you for your notification.

Coppelian commented 2 years ago

Hi @crista . Can SourcererCC report the cloned lines by using function-level tokenizer, or it has to use block-level tokenizer?

Coppelian commented 2 years ago

Hi @crista . Sorry for at you for several times. I think I got some result using block-level tokenizer. It will generate results if you use python 2.7. It looks like the tokenizer itself is not updated to python3. Thank you for your help.

Coppelian commented 2 years ago

This is the result using py27:

GO
'zipblocks'format
*** Starting priority projects...
*** Starting regular projects...
Starting new process 0
*** No more projects to process. Waiting for children to finish...
[INFO] (MainThread) Process 0 starting
[INFO] (MainThread) Starting zip project <11,test-env/2Shirt-SpellBurner.zip> (process 0)
[INFO] (MainThread) Attempting to process_zip_ball test-env/2Shirt-SpellBurner.zip
[INFO] (MainThread) Successfully ran process_zip_ball test-env/2Shirt-SpellBurner.zip
[INFO] (MainThread) Project finished <11,test-env/2Shirt-SpellBurner.zip> (process 0)
[INFO] (MainThread)  (0): Total: 0:00:00.012767micros | Zip: 0 Read: 0 Separators: 0micros Tokens: 0micros Write: 0micros Hash: 0 regex: 0
[INFO] (MainThread) Starting zip project <12,test-env/2xyo-indicator-ip.zip> (process 0)
[INFO] (MainThread) Attempting to process_zip_ball test-env/2xyo-indicator-ip.zip
[INFO] (MainThread) Attempting to process_file_contents test-env/2xyo-indicator-ip.zip/indicator-ip-master/test.py
[INFO] (MainThread) Starting tokenize_blocks of test-env/2xyo-indicator-ip.zip/indicator-ip-master/test.py
[WARNING] (MainThread) File test-env/2xyo-indicator-ip.zip/indicator-ip-master/test.py cannot be parsed. encoding declaration in Unicode string (<unknown>, line 0)
[INFO] (MainThread) Returning None on tokenize_blocks for file test-env/2xyo-indicator-ip.zip/indicator-ip-master/test.py.
[WARNING] (MainThread) Problems tokenizing file test-env/2xyo-indicator-ip.zip/indicator-ip-master/test.py
[INFO] (MainThread) Successfully ran process_zip_ball test-env/2xyo-indicator-ip.zip
[INFO] (MainThread) Project finished <12,test-env/2xyo-indicator-ip.zip> (process 0)
[INFO] (MainThread)  (0): Total: 0:00:00.001362micros | Zip: 90 Read: 51 Separators: 0micros Tokens: 0micros Write: 0micros Hash: 0 regex: 0
[INFO] (MainThread) Starting zip project <13,test-env/3demax-Take-a-break.zip> (process 0)
[INFO] (MainThread) Attempting to process_zip_ball test-env/3demax-Take-a-break.zip
[INFO] (MainThread) Attempting to process_file_contents test-env/3demax-Take-a-break.zip/Take-a-break-master/examples/appmenu.py
[INFO] (MainThread) Starting tokenize_blocks of test-env/3demax-Take-a-break.zip/Take-a-break-master/examples/appmenu.py
[WARNING] (MainThread) Finished step1 on process_file_contents
[WARNING] (MainThread) Finished step2 on process_file_contents
[INFO] (MainThread) Successfully ran process_file_contents test-env/3demax-Take-a-break.zip/test-env/3demax-Take-a-break.zip/Take-a-break-master/examples/appmenu.py
[INFO] (MainThread) Attempting to process_file_contents test-env/3demax-Take-a-break.zip/Take-a-break-master/examples/dynamic_status_icon.py
[INFO] (MainThread) Starting tokenize_blocks of test-env/3demax-Take-a-break.zip/Take-a-break-master/examples/dynamic_status_icon.py
[WARNING] (MainThread) Finished step1 on process_file_contents
[WARNING] (MainThread) Finished step2 on process_file_contents
[INFO] (MainThread) Successfully ran process_file_contents test-env/3demax-Take-a-break.zip/test-env/3demax-Take-a-break.zip/Take-a-break-master/examples/dynamic_status_icon.py
[INFO] (MainThread) Attempting to process_file_contents test-env/3demax-Take-a-break.zip/Take-a-break-master/examples/teatime.py
[INFO] (MainThread) Starting tokenize_blocks of test-env/3demax-Take-a-break.zip/Take-a-break-master/examples/teatime.py
[WARNING] (MainThread) Finished step1 on process_file_contents
[WARNING] (MainThread) Finished step2 on process_file_contents
[INFO] (MainThread) Successfully ran process_file_contents test-env/3demax-Take-a-break.zip/test-env/3demax-Take-a-break.zip/Take-a-break-master/examples/teatime.py
[INFO] (MainThread) Attempting to process_file_contents test-env/3demax-Take-a-break.zip/Take-a-break-master/examples/timer.py
[INFO] (MainThread) Starting tokenize_blocks of test-env/3demax-Take-a-break.zip/Take-a-break-master/examples/timer.py
[WARNING] (MainThread) Finished step1 on process_file_contents
[WARNING] (MainThread) Finished step2 on process_file_contents
[INFO] (MainThread) Successfully ran process_file_contents test-env/3demax-Take-a-break.zip/test-env/3demax-Take-a-break.zip/Take-a-break-master/examples/timer.py
[INFO] (MainThread) Attempting to process_file_contents test-env/3demax-Take-a-break.zip/Take-a-break-master/take-a-break.py
[INFO] (MainThread) Starting tokenize_blocks of test-env/3demax-Take-a-break.zip/Take-a-break-master/take-a-break.py
[WARNING] (MainThread) Finished step1 on process_file_contents
[WARNING] (MainThread) Finished step2 on process_file_contents
[INFO] (MainThread) Successfully ran process_file_contents test-env/3demax-Take-a-break.zip/test-env/3demax-Take-a-break.zip/Take-a-break-master/take-a-break.py
[INFO] (MainThread) Successfully ran process_zip_ball test-env/3demax-Take-a-break.zip
[INFO] (MainThread) Project finished <13,test-env/3demax-Take-a-break.zip> (process 0)
[INFO] (MainThread)  (0): Total: 0:00:00.032250micros | Zip: 14999 Read: 254 Separators: 1666micros Tokens: 575micros Write: 251micros Hash: 29 regex: 632
[INFO] (MainThread) Process 0 finished. 6 files in 0s.
Process 0 finished, 6 files processed (3000006). Current total: 6
*** All done. 6 files in 0:00:00.069147
crista commented 2 years ago

As I said, we don't support the block-level tokenizer anymore. If it breaks, you can keep the pieces :-)

Coppelian commented 2 years ago

Okay, thank you.

crista commented 2 years ago

FWIW, I just updated the block-level tokenizer with a patch that should make it work for Python3 https://github.com/Mondego/SourcererCC/commit/c296871e3315533563e3060476f45ed5cbfbd083

Coppelian commented 2 years ago

Appreciate your help!