-
My previous Issue #10 was getting highly repetitive, and I wanted a way to parameterize my test cases and make them more efficient. Plus, I still haven't bothered learning Pester yet.
Feature 1 - C…
-
Hi folks,
Thanks for sharing such a wonderful dataset! I am conducting my research with the dataset.
I was wondering whether you have a list of md5 or sha1 for each archive that I can check the inte…
-
GitHub currently has a `robots.txt` which is preventing crawling of the paths associated with the Wiki area for each and every repository. This is explicit and looks very intentional. I've asked about…
-
### Describe the feature you'd like to request
In order to better collect all of the commit data we should get more events from forked repositories. For things like, squash commits it would be good t…
-
Download front pages of several million websites with curl.
Record all metadata such as: headers, redirects, TLS version, cipher... as well as data (HTTP body).
Create a dataset from it. The dataset…
-
If you asked me:
> How many _active_ contributors does Hoodie have today?
I could not answer it. Nor could any other maintainer from any other Open Source project that I asked so far. And this i…
-
In trying to track some general GitHub trends over time, I've noticed a few discrete jumps in the GH Archive data. One such case is a sustained (15-30%) jump in the number of data observations beginni…
-
We should terraform expose BigQuery data sets as public. Anything exposed in Hasura via our API should also be queryable from BigQuery directly.
We should also add documentation on how to get setup f…
-
I noticed that some of the SHA1s of email addresses calculated for commits don't seem to be right.
For example (chosen randomly):
https://github.com/google/souper/commit/9cd0a32caba78693638692fd…
-
I'm trying to run some analysis tasks over the labels an issue gets attached. I started running this query on BigQuery:
```
SELECT repo.name, JSON_EXTRACT_SCALAR(payload, '$.issue.labels') labels
…