datactive / bigbang

Scientific analysis of collaborative communities
http://datactive.github.io/bigbang/
MIT License
152 stars 52 forks source link

Add a CLI #585

Closed micahflee closed 1 year ago

micahflee commented 1 year ago

There are several scripts in the bin folder, such as collect_mail.py, which rely on the bigbang module being installed. This PR aims to replace all of those scripts with a single CLI that's built directly into the bigbang module. This way, users will be able to pip install bigbang and then not only get a the module, but also a CLI tool that allows users to do things like collect mail.

In this PR, when you install the bigbang package it creates a bigbang CLI program, like this:

$ bigbang
Usage: bigbang [OPTIONS] COMMAND [ARGS]...

  BigBang CLI tools

Options:
  --help  Show this message and exit.

Commands:
  collect-mail  Collects files from public mailing list archives
$ bigbang collect-mail --help
Usage: bigbang collect-mail [OPTIONS]

  Collects files from public mailing list archives

Options:
  --url TEXT       URL of mailman archive
  --file TEXT      Path of a file with linebreak-seperated list of URLs
  --archives TEXT  Path to a specified directory for storing downloaded mail
                   archives  [default: ./archives]
  --notes TEXT     Notes to record regarding provenance
  --help           Show this message and exit.

So far, this completely replaces the functionality of bin/collect_mail.py, so that script is deleted, and the documentation for it has been updated.

(This branch is based off of the #584 branch.)

micahflee commented 1 year ago

What I want to understand is exactly which of the scripts in bin should be implemented as CLI commands. When I search the docs for bin/ here's what I find:

$ grep -r "bin/" docs
docs/datasets/md_git.md:    * single url `python bin/collect_git.py -u https://github.com/scipy/scipy.git`
docs/datasets/md_git.md:    * file of urls `python bin/collect_git.py -f examples/git_urls.txt`
docs/datasets/md_git.md:    * Github organization name `python bin/collect_git.py -g glass-bead-labs`
docs/datasets/md_git.md:python bin/collect_git.py -u https://github.com/scipy/scipy.git
docs/datasets/md_git.md:python bin/collect_git.py -f examples/git_urls.txt
docs/datasets/md_git.md:python bin/collect_git.py -g glass-bead-labs
docs/datasets/mailinglists.rst:``python3 bin/collect_draft_metadata.py -w httpbis``
docs/datasets/mailinglists.rst:``https://listserv.ieee.org/cgi-bin/wa?INDEX``
docs/datasets/mailinglists.rst:        url="https://listserv.ieee.org/cgi-bin/wa?A0=IEEE-TEST",
docs/datasets/mailinglists.rst:        url_login="https://listserv.ieee.org/cgi-bin/wa?LOGON",
docs/datasets/mailinglists.rst:        url_pref="https://listserv.ieee.org/cgi-bin/wa?PREF",

Based on this, it looks like there's documentation related to collect_git.py and collect_draft_metadata.py. So I can implement those two. Are there any other CLI commands that should be implemented?

micahflee commented 1 year ago

I've added collect-git commands as well.

$ bigbang --help
Usage: bigbang [OPTIONS] COMMAND [ARGS]...

  BigBang CLI tools

Options:
  --help  Show this message and exit.

Commands:
  collect-git-from-file-of-urls  Load git data from repo URLs listed in a...
  collect-git-from-github-org    Load git data from repos in a GitHub...
  collect-git-from-url           Load git data from a repo URL
  collect-mail                   Collects files from public mailman...
$ bigbang collect-git-from-url --help
Usage: bigbang collect-git-from-url [OPTIONS]

  Load git data from a repo URL

Options:
  --url TEXT        URL of the git repo  [required]
  --update BOOLEAN  Update the git repo  [default: False]
  --help            Show this message and exit.
$ bigbang collect-git-from-file-of-urls --help
Usage: bigbang collect-git-from-file-of-urls [OPTIONS]

  Load git data from repo URLs listed in a file

Options:
  --path TEXT       Path of the file full of git repo URLs  [required]
  --update BOOLEAN  Update the git repo  [default: False]
  --help            Show this message and exit.
$ bigbang collect-git-from-github-org --help
Usage: bigbang collect-git-from-github-org [OPTIONS]

  Load git data from repos in a GitHub organization

Options:
  --org-name TEXT  GitHub organization name  [required]
  --help           Show this message and exit.

I've also updated the documentation for using these, rather than running the scripts in bin.

codecov-commenter commented 1 year ago

Codecov Report

Patch coverage: 46.47% and project coverage change: -0.53 :warning:

Comparison is base (9d4eed6) 73.73% compared to head (f9dbbc9) 73.20%.

:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #585 +/- ## ========================================== - Coverage 73.73% 73.20% -0.53% ========================================== Files 30 31 +1 Lines 3632 3702 +70 ========================================== + Hits 2678 2710 +32 - Misses 954 992 +38 ``` | Flag | Coverage Δ | | |---|---|---| | unittests | `73.20% <46.47%> (-0.53%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=datactive#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/datactive/bigbang/pull/585?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=datactive) | Coverage Δ | | |---|---|---| | [bigbang/cli.py](https://codecov.io/gh/datactive/bigbang/pull/585?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=datactive#diff-YmlnYmFuZy9jbGkucHk=) | `44.92% <44.92%> (ø)` | | | [bigbang/\_\_init\_\_.py](https://codecov.io/gh/datactive/bigbang/pull/585?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=datactive#diff-YmlnYmFuZy9fX2luaXRfXy5weQ==) | `100.00% <100.00%> (ø)` | | | [bigbang/config.py](https://codecov.io/gh/datactive/bigbang/pull/585?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=datactive#diff-YmlnYmFuZy9jb25maWcucHk=) | `100.00% <100.00%> (ø)` | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=datactive). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=datactive)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

micahflee commented 1 year ago

I've implemented collect-draft-metadata:

$ bigbang collect-draft-metadata --help
Usage: bigbang collect-draft-metadata [OPTIONS]

  Collects files from public mailing list archives

Options:
  --working-group TEXT  IETF working group acronym  [required]
  --help                Show this message and exit.

It doesn't quite work... but I confirmed that bin/collect_draft_metadata.py was broken in the same way so I guess this is fine.

$ bigbang collect-draft-metadata --working-group hrpc
Traceback (most recent call last):
  File "/Users/user/code/bigbang/env/bin/bigbang", line 8, in <module>
    sys.exit(main_cli())
  File "/Users/user/code/bigbang/env/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/user/code/bigbang/env/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/user/code/bigbang/env/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/user/code/bigbang/env/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/user/code/bigbang/env/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/user/code/bigbang/env/lib/python3.10/site-packages/bigbang/cli.py", line 125, in collect_draft_metadata
    group = dt.group_from_acronym(wg)
  File "/Users/user/code/bigbang/env/lib/python3.10/site-packages/ietfdata/datatracker.py", line 2728, in group_from_acronym
    groups = list(self._retrieve_multi(url, Group))
  File "/Users/user/code/bigbang/env/lib/python3.10/site-packages/ietfdata/datatracker.py", line 2086, in _retrieve_multi
    fetch_obj = self.pavlova.from_mapping(obj_json, obj_type) # type: T
  File "/Users/user/code/bigbang/env/lib/python3.10/site-packages/pavlova/__init__.py", line 102, in from_mapping
    data[field.name] = self.parse_field(
  File "/Users/user/code/bigbang/env/lib/python3.10/site-packages/pavlova/__init__.py", line 150, in parse_field
    return self.parsers[base_type].parse_input(
KeyError: typing.Optional
sbenthall commented 1 year ago

Thank you! This is an awesome improvement.