ansible / galaxy

Legacy Galaxy still available as read-only on https://old-galaxy.ansible.com - looking for the new galaxy -> https://github.com/ansible/galaxy_ng
Apache License 2.0
854 stars 328 forks source link

Content Scoring + SCM sync'ing #510

Open chouseknecht opened 6 years ago

chouseknecht commented 6 years ago

This is part of the 'scoring' effort, meaning how do we 'score' content in Galaxy. What makes one piece of content better than another, and which piece of content should be surfaced at the top of the search results?

SCM stats are an important part of scoring content. Things like last commit date, open issues, stargazers, etc. should be part fo the 'score'. To that end, we currently have a nightly job that syncs the Galaxy DB with info from GitHub, including: stargazers, watchers, forks, last commit date, and maybe a couple other bits.

I think we need to expand on this in a couple ways:

Firstly, investigate integrating some portions of @jctanner's Graph data collection tool. Check out http://dash.tannerjc.net/graph. Enter 'repo:ansible' or 'repo:galaxy' as a query, see the graph it creates.

The idea is to understand what data this tool collects, how it collects, and what might be useful to use in Galaxy. From Galaxy's perspective, we want to inform our search results, and surface the best content first. Some of the data this tool gathers seems like it might be useful to that end.

Secondly, we want to integrate this in a way that's 'pluggable', meaning that we will want to eventually collect data from more sources than GitHub. As we become a product, a user might want to write a custom collector to collect data from an internal SCM, or some other source.

Thirdly, as we've been exploring integration with Pulp, we'll want to construct this in a way that works well in, and takes advantage of the Pulp tasking system.

jctanner commented 6 years ago

https://github.com/jctanner/ansible-dashboard/blob/master/crawler/ghcrawler.py