fossology / atarashi

Atarashi scans for license statements in open source software, focusing on text statistics. Designed to work stand-alone and with FOSSology.
http://fossology.github.io/atarashi
GNU General Public License v2.0
26 stars 23 forks source link

Error in CommentPreprocessor #93

Closed xavierfigueroav closed 2 years ago

xavierfigueroav commented 2 years ago

I get the following error when running python atarashii.py -a wordFrequencySimilarity <file>:

Traceback (most recent call last):
  File "atarashii.py", line 213, in <module>
    main()
  File "atarashii.py", line 167, in main
    result = run_scan(scanner_obj, inputPath)
  File "atarashii.py", line 116, in run_scan
    return scanner.scan(inputFile)
  File "/home/xavierfigueroav/Documents/atarashi-project/atarashi/atarashi/../atarashi/agents/wordFrequencySimilarity.py", line 41, in scan
    processedData = super().loadFile(filePath)
  File "/home/xavierfigueroav/Documents/atarashi-project/atarashi/atarashi/../atarashi/agents/atarashiAgent.py", line 44, in loadFile
    self.commentFile = CommentPreprocessor.extract(filePath)
  File "/home/xavierfigueroav/Documents/atarashi-project/atarashi/atarashi/../atarashi/libs/commentPreprocessor.py", line 131, in extract
    data = json.loads(data_file)
  File "/usr/lib/python3.8/json/__init__.py", line 341, in loads
    raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not dict

I got this error when running the wordFrequencySimilarity agent, but since commentPreprocessor is used by all the others, this may be affecting the whole package.

The error occurs because the result of the function extract from Nirjas is passed in the method loads of the module json, but extract returns a dictionary and loads expects a string. See lines 129 and 130 in commentPreprocessor.py.

https://github.com/fossology/atarashi/blob/f7152e1cabab4245d75d25ef67c6a04f4c9bdbc7/atarashi/libs/commentPreprocessor.py#L126-L132

So the fix consists of removing the line 130 and passing data_file in licenseComment, instead of data.