LLNL / scraper

Python library for getting metadata from source code hosting tools
MIT License
50 stars 23 forks source link

adding license info for github, where exists #21

Closed leebrian closed 6 years ago

leebrian commented 6 years ago

CDC wants to automatically capture license info where possible. Added these few lines to use github info where it's already in the response. Didn't add GitLab as that makes a separate http request to get license info.

Not sure if you wanted to parameterize this and you have two different ways (config file, and command line) so didn't add this in. But please let me know if you have a preference and I'll add parameters. Currently will ignore if no license info is available for repo.

leebrian commented 6 years ago

You got it. I added one for GitLab. If you let me know what way you'd like to pass configs, I can complete the todo for GitLab in a different pull request. So default can not to be to pound with api calls, but if needed, users can select it.

IanLee1521 commented 6 years ago

I think you might have missed the change for the GitLab? Looks like you only took out the comment there.

As for as configs... what do you mean by that? Not sure I'm following.

leebrian commented 6 years ago

There should be a new line at 286 with a todo for gitlab. And line 25 removed the todo for GitHub.

Re: parameters The core parameters come from config.json and are overriden in gen_code_gov_json.py.main. And then there are extra parameters passed in as command line arguments and parsed with argparse that override config.json. But there are also some like "skip-labor-hours" that is only passed as an argument to main and is passed down to process_config as an extra method argument.

Do you have a preference for how to pass? Or you're ok with me adding another argument for passing a new "fetch-remote-license" that the GitLab class will see and have a default value to False?

IanLee1521 commented 6 years ago

Ah ok.

I see what you mean.

I think that the compute_labor_hours (https://github.com/LLNL/scraper/blob/1d373b4b4b339b249b0c553ad8248ea45913cad1/scraper/gen_code_gov_json.py#L77) setting should be moved into the config up front instead: https://github.com/LLNL/scraper/blob/1d373b4b4b339b249b0c553ad8248ea45913cad1/scraper/gen_code_gov_json.py#L48-L58

That is the new way, and I may just not have gotten that one reference.

leebrian commented 6 years ago

That makes sense. I'll refactor out the gitlab bit and add the param to config.json, but not sure when as it's not as important for our catalog (GitLab is used internally so any OSS projects will get cloned out to Github).

IanLee1521 commented 6 years ago

Shall I go ahead and merge this in for now, and you can submit another PR for the next change (aka gitlab) ?

leebrian commented 6 years ago

Yes please, if it's ok with you. I don't want to make the changes coupled so this one lingers.