LLNL / scraper

Python library for getting metadata from source code hosting tools
MIT License
49 stars 23 forks source link

Add support for Gitlab languages #31

Closed leebrian closed 5 years ago

leebrian commented 5 years ago

This request includes support for reading language metadata for GitLab projects. Unfortunately, the GitLab api requires a separate call out to each project to read the APIs and that can take a while. One of my GitLab servers went from 5 seconds to 13 minutes. So I added a config setting for GitLab configs for fetch_languages and defaulted it to false. This should protect current users of GitLab from experiencing slowdowns.

Happy to refactor to default to true if you have a preference.

I noticed this behavior when trying to compare language use across projects and saw that only GitHub projects had the languages tab populated.

leebrian commented 5 years ago

I like fetch languages because it implies an external access, however, I used include_languages as the property to follow your pattern of include_public_only and include_labor_hours.

I used fetch as the python var so I can change the config file setting easily if you’re ok. Or maybe just call it “languages” since the config property is “public_only” for GitHub.

IanLee1521 commented 5 years ago

Sorry for the delay.

I don't think I was clear originally, you use include_languages in the demo.json file (https://github.com/LLNL/scraper/pull/31/files#diff-99653d75bdc5ec0eb2f43bef8bb4b60eR37), but fetch_languages in the code (https://github.com/LLNL/scraper/pull/31/files#diff-23a8fab4c42e4f024d0cbf4afd08b640R70). We'll want those to be the same (whichever you prefer is fine with me).

leebrian commented 5 years ago

Thanks. That makes sense and I agree. I just updated the PR to consistently use fetch_languages and think this will be clearer so code matches config and readme.

IanLee1521 commented 5 years ago

This looks great @leebrian ! Thanks!