Open RajuKoushik opened 7 years ago
Are you working on this?
Yeah @singh1114
@RajuKoushik can you be more explicit and describe more what you mean by this ticket?
@pombredanne This issue is no more a separate one. As we have discussed and planned to have a single view which accepts any kind of URL(even the special git repo URL). I have written a API which can take in requests from both the URL types and gives us the scan results as a response. https://github.com/nexB/scancode-server/pull/61
ok, but I need to understand what this about about. What is the problem and solution... what is the thing you want to achieve here ;)
You initial description is kinda terse and means not much... e.g.
Create POST API endpoint to ScanCode given GitHub or BitBucket or Git URL.
When you create a ticket try to be explanatory and descriptive. I am not sure what you are after and you need to write what you have in mind exactly: Assuming I get some of what this is about may be something similar to this would be better:
When a user requests a scan with the URL to a GitHub, Bitbucket, Gitlab or similar URL repository we can handles these a few different way:
- Treat this as a regular URL, in which case we may end up downloading and scanning a web page about a repository and not the code itself
- Or recognize that this is a special type of URL and be smart about what to download and then scan
The option 2 seems a much better approach as in most cases the use is likely to want the code in the repo to be scanned rather than the HTML page of a repo as rendered by Github or Bitbucket .
There are a couple considerations to get this right:
- we need to recognize these URLs.
- In some case they may point to a certain file or commit or branch. This should be detected properly to determine which files or branch to fetch.
- In some other cases they may point to a direct zip or tarball download or a "raw" blob. In these case they should likely fe fetched as-is
- once the URL has been recognized and deconstructed there are a different way things would be fetched:
- we could use a git clone (or hg clone in some cases for Bitbucket)
- we could use a tarball or zip download
- then what to fetch needs to be determined is not specified
- we can scan the head of the default branch
- we can scan tags/releases/branches, all or some of them (though scanning them all does not make sense and we likely only want to scan one thing only)
- since the URL or way things are fetched may not match what the user entered as a URL there may be a need to store the reference to what was effectively fetched.
When you create a ticket this is this kind of details that you should specify... otherwise there is not much that can be discussed
@pombredanne Present status of the URLScan View as in the pull request #61 -
Accepts the URL.
Create an Instance of InsertIntoDB
` insert_into_db = InsertIntoDB()`
Call the create_scan_id function
` scan_id = insert_into_db.create_scan_id(scan_type)`
Checks if it is a 'git' URL.(Using the urlparse() function )
If it is a 'git' URL-
Fetches and clones the git repository in the home directory.(This has to occur in the background which has to be changed.)
apply_scan_async.delay(path, scan_id, scan_type, URL, folder_name)
The fetched repo is scanned from its path in the background and then the response is redirected to the results scan.
The files are stored in a directory and then it is scanned in the background.
scan_code_async.delay(URL, scan_id, path)
Things to be worked on -
Write tests.
The fetching has to be done in the background.
We need to recognise these URLs.In some case they may point to a certain file or commit or branch. This should be detected properly to determine which files or branch to fetch.
Create POST API endpoint to ScanCode given GitHub or BitBucket or Git URL.