LLNL / llnl.github.io

Public home for LLNL software catalog
https://software.llnl.gov
MIT License
52 stars 50 forks source link

import tags from github, and allow fine-grained search on them #20

Open sbromberger opened 7 years ago

sbromberger commented 7 years ago

Tags are a great way to categorize software repos and they're already built into GitHub repos. It would be great to be able to filter on "tag:foo" (and, incidentally, "language:C", but that's probably another issue).

IanLee1521 commented 7 years ago

Yeah, language is possibly a separate issue.

As far as tags, this was difficult at the time I started the site, but should be possible now that GitHub added "Topics" (aka tags aka labels) for repositories: https://github.com/blog/2309-introducing-topics

LRWeber commented 7 years ago

Repo "language" and "topics" data is now being collected. (And displayed in the Explore tab!) This information could potentially be incorporated into the search functionality.

IanLee1521 commented 5 years ago

Some of this work has become with @hauten 's lead

See also: https://github.com/LLNL/llnl.github.io/tree/add-topics

gonsie commented 5 years ago

@angfl97 Could you do some data analysis to help us understand the categories already in use?

  1. What topics are already in use by our repos, and how many repos fall into each topic?
  2. Another way to broadly categorize the repos would be based on organization (other than LLNL). How many non-llnl organizations do we have in the catalog, and how many repos are in each?
  3. I also like @sbromberger's idea. We have the data about which languages are used, what is the set of unique languages, and how many repos use each one?
LRWeber commented 5 years ago

It may be worth noting that logic for answering some of these questions exists to generate our "word cloud" visualizations at the bottom of the explore page and individual repo pages.

The cloud-generator takes a list of {name: aWord, value: wordCount} objects, which is what these functions output. They may be worth a look.

https://github.com/LLNL/llnl.github.io/blob/eb89e81d36d9a06b80be1a6b55f2142e842faedc/js/explore/cloud_topics.js#L69-L95

https://github.com/LLNL/llnl.github.io/blob/eb89e81d36d9a06b80be1a6b55f2142e842faedc/js/explore/cloud_languages.js#L69-L99

angela-flores-wdc commented 5 years ago

I made an Excel workbook with the stats @gonsie asked for.

Here is the link

gonsie commented 5 years ago

For those not traversing the link, these topics are mentioned in 4 or more repositories:

I was hoping that we'd get some topics outside of the typical "hpc" stuff, but I guess not. The language tags are sort of interesting:

Language count
shell 292
python 252
C 210
C++ 202
Makefile 174
CMake 113
HTML 85

But I'm not sure that's immediately useful. There are 13 repos using AWK... maybe digging into the lesser used languages would be cool.

What I do think is actually useful are the repos we are pulling from non-LLNL organizations. The top 5 (most repos) come from:

Some of these projects would be very cool to highlight on their own as they sort of represent a whole ecosystem of interrelated repos. These are also the places where we get the most external interaction.

hauten commented 5 years ago

Would be awesome if more repos had topics. I'd done a couple of inventories over the last year and it's something like <10%. Maybe this can encourage PIs: Our portal (not to mention GitHub) will provide more visibility to repos that have topics.

hauten commented 5 years ago

See https://github.com/LLNL/llnl.github.io/blob/new-home-page/radiuss/README.md for a list of tags on radiuss repos - will aim to use that list & the notes above as starting points for standardizing tags across other LLNL repos

IanLee1521 commented 5 years ago

@hauten -- Maybe list our standard tags on https://github.com/LLNL/llnl.github.io/blob/master/about/using-github.md ?

IanLee1521 commented 5 years ago

Actually, for the docs, we can start the listing here: https://github.com/LLNL/llnl.github.io/tree/master/categories