a2i2 / mining-data-science-repositories

A large-scale comparative analysis of Coding Standard conformance in Open-Source Data Science projects
https://arxiv.org/abs/2007.08978
7 stars 5 forks source link

Mining Data Science Repositories

Research on mining Data Science repositories.

Steps to Run

Figshare: Extract contents of results.tar.gz to output directory, then jump to Analyse results (in Jupyter) section.

From scratch: Clone this repository then follow steps below to identify, clone, and analyse the repositories.

Install Docker

If docker is not present

Link to install (docker install)

Datasets

We have four directories: data, input_drive, input, and output:

GitHub Access Token

Go to https://github.com/settings/tokens/new to generate a new token with the perimissions public_repo and read:packages, and update mining_nlp_repositories/github.py with your ACCESS_TOKEN.

Tasks

Analyse results (in Jupyter):

Known Bugs