reaper requires the GHTorrent database be restored to a MySQL/MariaDB instance. The requirement to have the full GHTorrent database restored before running reaper is prohibitively time intensive (the GHTorrent database dump from 2019-06-01 is over 100 GB in size). The removal of dependency on GHTorrent will require reaper to mine GitHub for the repository data and metadata that has already been mined by the GHTorrent project. On the other hand, there will be no need to restore repository data and metadata for several million repositories while all the user wants to do is analyze a few.
Description
reaper
requires the GHTorrent database be restored to a MySQL/MariaDB instance. The requirement to have the full GHTorrent database restored before runningreaper
is prohibitively time intensive (the GHTorrent database dump from 2019-06-01 is over 100 GB in size). The removal of dependency on GHTorrent will requirereaper
to mine GitHub for the repository data and metadata that has already been mined by the GHTorrent project. On the other hand, there will be no need to restore repository data and metadata for several million repositories while all the user wants to do is analyze a few.