Closed JianyuZhao7 closed 7 years ago
Hi, the data gets scraped from GitHub when you run the 'crawler' command.
The URLs that it searches are in crawler/algorithms/commands.py
'bubblesort': 'https://api.github.com/search/code?q=filename:bubblesort.py%20language:python',
'mergesort': 'https://api.github.com/search/code?q=filename:mergesort.py%20language:python',
'quicksort': 'https://api.github.com/search/code?q=filename:quicksort.py%20language:python',
'linkedlist': 'https://api.github.com/search/code?q=filename:linkedlist.py%20language:python',
'bfs': 'https://api.github.com/search/code?q=filename:bfs.py%20language:python',
'knapsack': 'https://api.github.com/search/code?q=filename:knapsack.py%20language:python'
Hope this helps, let me know if you have more questions.
Hello, I want to see the original data used for training (not in pickle, because i want to see it)? Can you tell me where is it?