fossology / atarashi

Atarashi scans for license statements in open source software, focusing on text statistics. Designed to work stand-alone and with FOSSology.
http://fossology.github.io/atarashi
GNU General Public License v2.0
26 stars 23 forks source link

Ability to scan directories #77

Closed GMishx closed 3 years ago

GMishx commented 3 years ago

Currently atarashi can scan only files. If a directory is provided as input, it should be able to find all files under it and run the selected agent on them. The results of each scan can be stored in a list and printed as a JSON array.

It will be preferred, however, to print results as they come maintaining the validity of the JSON array. So if someone is running a scan in interactive terminal, it should not give a feeling as nothing is happening. It can be emulated as printing a starting [ followed by printing of scan result object {...} and a ,. The last result will not have a trailing , and a ] can be printed at the end of scan. This approach will eliminate the need of additional list to hold temporary results.

codeakki commented 3 years ago

May i work on these please

GMishx commented 3 years ago

Yes @codeakki . Shall I assign the issue to you?

codeakki commented 3 years ago

yes sir I will try my best.

codeakki commented 3 years ago

can u please help in understanding the workflow of atarshi nd in which file I had to work .

and it is genrating a error:[Running] /usr/bin/env python3 "c:\Users\pankaj akshay\Documents\GitHub\atarashi\atarashi\agents\atarashiAgent.py" The system cannot find the path specified.

codeakki commented 3 years ago

items = os.listdir(".")

newlist = [] for names in items: if names.endswith(".txt"): newlist.append(names) print newlist

May i use this approach to find files in directories.

Aman-Codes commented 3 years ago

Hi @codeakki

if names.endswith(".txt"):

This may not work for files other than .txt

For reference you may refer to the evaluator.py file which evaluates the performance of atarashi on a zip folder containing 100 licenses by unziping it and then running it for each individual file.

hastagAB commented 3 years ago

Hi @codeakki, as stated by @Aman-Codes you can take reference from evaluator.py#L92. Adding to that a similar feature is already there in the Nirjas project (nirjas/main.py#L113). You can take reference from there also.

codeakki commented 3 years ago

sir Its generating this error for every file i try to run on vs codde 👍error:[Running] /usr/bin/env python3 "c:\Users\pankaj akshay\Documents\GitHub\atarashi\atarashi\agents\atarashiAgent.py" The system cannot find the path specified.

Aman-Codes commented 3 years ago

sir Its generating this error for every file i try to run on vs codde 👍error:[Running] /usr/bin/env python3 "c:\Users\pankaj akshay\Documents\GitHub\atarashi\atarashi\agents\atarashiAgent.py" The system cannot find the path specified.

It looks like python can not find the file you want to run. Either python is not correctly setup or you are trying to run it natively on windows ( path /usr/bin/env is present in linux but not in windows)

codeakki commented 3 years ago

I rebooted my device nd installed linux within couple of days i get familiar to codebase.

codeakki commented 3 years ago

def list_files2(directory, extension):
for (dirpath, dirnames, filenames) in walk(directory): return (f for f in filenames if f.endswith('.' + extension)

May be This approach work fine to list all the files under directory or I Go with (nirjas/main.py#L113) .

hastagAB commented 3 years ago

def list_files2(directory, extension): for (dirpath, dirnames, filenames) in walk(directory): return (f for f in filenames if f.endswith('.' + extension)

May be This approach work fine to list all the files under directory or I Go with (nirjas/main.py#L113) .

Both are more of the same, either one of them should work fine.

codeakki commented 3 years ago

When Ever i try to run any agent in evaluatory.py It Shows me Errors that index out of range env) akshay@akshay-VirtualBox:~/atarashi/atarashi/evaluator$ atarashi -a tfidf Testfiles/APSL-style.html Traceback (most recent call last): File "/home/akshay/atarashi/env/bin/atarashi", line 8, in sys.exit(main()) File "/home/akshay/atarashi/env/lib/python3.8/site-packages/atarashi/atarashii.py", line 123, in main result = atarashii_runner(inputFile, processedLicense, agent_name, similarity, ngram_json, verbose) File "/home/akshay/atarashi/env/lib/python3.8/site-packages/atarashi/atarashii.py", line 83, in atarashii_runner result = scanner.scan(inputFile) File "/home/akshay/atarashi/env/lib/python3.8/site-packages/atarashi/agents/tfidf.py", line 140, in scan return self.tfidfcosinesim(filePath) File "/home/akshay/atarashi/env/lib/python3.8/site-packages/atarashi/agents/tfidf.py", line 112, in tfidfcosinesim processedData1 = super().loadFile(inputFile) File "/home/akshay/atarashi/env/lib/python3.8/site-packages/atarashi/agents/atarashiAgent.py", line 44, in loadFile self.commentFile = CommentPreprocessor.extract(filePath) File "/home/akshay/atarashi/env/lib/python3.8/site-packages/atarashi/libs/commentPreprocessor.py", line 129, in extract data1 = licenseComment(data) File "/home/akshay/atarashi/env/lib/python3.8/site-packages/atarashi/libs/commentPreprocessor.py", line 42, in licenseComment for id, item in enumerate(data[0]["multi_line_comment"]): IndexError: list index out of range (env) akshay@akshay-VirtualBoxatarashi/atarashi/evaluator$

codeakki commented 3 years ago

Can anyone help me please @hastagAB @GMishx @Aman-Codes

Aman-Codes commented 3 years ago

Thanks @codeakki for pointing out that I tried to reproduce the error and found that this error occurs on giving invalid file path to atarashi.

Steps to reproduce

Also I believe the issue is not in the evaluator script but in atarashii.py file. We can display a better error message that file path does not exist. Would like to know others view on it

Aman-Codes commented 3 years ago

When Ever i try to run any agent in evaluatory.py It Shows me Errors that index out of range env) akshay@akshay-VirtualBox:~/atarashi/atarashi/evaluator$ atarashi -a tfidf Testfiles/APSL-style.html

Also the command to test the code using evaluator.py is python3 evaluator.py -a AGENTNAME which in your case is python3 evaluator.py -a tfdif (Run the above command inside the folder atarashi/atarashi/evaluator/) Also you can run python3 evaluator.py -h for knowing more about the usage

codeakki commented 3 years ago

@Aman-Codes Would u want to work on this issue then u may continue. As i'm inactive due to exams in the coming days

Aman-Codes commented 3 years ago

Sure I can work on this but as this is a good first issue we can let this remain open for beginners contributing to this repository. @GMishx What do you suggest?

codeakki commented 3 years ago

actually, I'm also a beginner I tried a lot but every time I messed up the whole code will back to work after my midterm exam over @Aman-Codes

SinghShreya05 commented 3 years ago

@hastagAB @GMishx I have created a function that extracts all files from the directories and subdirectories, and simultaneously prints the results, stores, and returns in JSON. I have created a separate .py file for this. In the evaluator.py file where commands are taken from the terminal, I am trying to figure out how to attach my personal command to the scan_directory.py file so that a user can run this "scandir" command from the terminal itself. I have added a "scandir" command in the argument parser and scan_directory.py file in the "agents" folder. Am I missing something here?

Kaushl2208 commented 3 years ago

Hey @SinghShreya05, You need to add the command in Atarashi argument parser and then connecting the function(your specific) call to atarashii.py itself.

Hope this helps.

SinghShreya05 commented 3 years ago

Yes, thanks @Kaushl2208 I'll do the necessary changes in the atarashii.py file and keep you posted.