datopian / datahub-qa

:package: Bugs, issues and suggestions for datahub.io
https://datahub.io/
32 stars 6 forks source link

Issues day-to-day counter #108

Closed AcckiyGerman closed 6 years ago

AcckiyGerman commented 6 years ago

As a team we want to see how's going the epic fight :crossed_swords: with the Issues horde :japanese_ogre: , so we could feel like real heroes! :v:

Expected behavior

I would like to see:

Tasks

Analysis

Data is sourced from the github API. API endpoint for issues returns a list of issues with the following info:

The biggest problem we have is that API does NOT return issues history for every day. If we want to see the issues history statistics for every day, then we need to count issues day-to-day

@acckiygerman see two ways to solve the problem:

  1. fetch all the issues from the beginning of time and calculate daily statistics using open/close dates info

    • PRO:
      • we have statistics from the beginning
      • statistics could be recalculated at any moment
      • easy to build day-to-day statistics for closed issues
    • CONTRA:
      • Complex logic needed to calculate number of opened issues day-to-day (@acckiygerman tried and failed)
      • need to recalculate whole table every day.
      • number of issues to fetch (if we include closed issues) constantly grows
      • API does not returns more than 30 issues per request
      • number of API requests will soon exceed limits for non-auth requests (we need to use token credentials in the script)
      • wrong statistic in situations when severity labels was changed in the past (as API does not return changing history, only last modified date, and the current state of issue)
  2. Fetch open issues on the current day and issues closed in last 24h

    • PRO:
      • easy to sort issues, using Github API label and date filters
      • small number of API requests
    • CONTRA:
      • we need to execute the script daily (this can be automated in the Travis or Docker)
      • statistics starts from the date when we start to run script daily

Since the datahub-QA repo exists only for two month, And we don't want to spend much time for writing this script.. @acckiygerman decided to follow second, easier way.

Currently the dataset process script count issues on the current date. Labels:

details

If you change labels - next day script will count this issue in the other column

Also if you put two labels on an issue (say Minor AND Major) at the same time - this issue will be counted as two issues.

If you open and close issue in the same day - script will count +1 closed issue, but not count as open one.

AcckiyGerman commented 6 years ago

@zelima dataset is ready: https://github.com/datasets/datahub-qa/ There are updating script. Could you make an Travis task to update the data daily and push it on the datahub (or even better show me how to do it)? Later I will add the view.

zelima commented 6 years ago

@AcckiyGerman here is the first URL from Google for doing that https://gist.github.com/willprice/e07efd73fb7f13f917ea

Also, can I see more analysis of how exactly this will work? Eg

AcckiyGerman commented 6 years ago

This dataset will only count issues with labels (Critical, Major, Minor, Trivial, New Feature) on the current date and add one line in the result table:

date,Severity: Critical,Severity: Major,Severity: Minor,Severity: Trivial,NEW FEATURE
2018-02-10,2,13,30,15,3

If people replace the severity label, next day script will count - 1 for previous label and + 1 for current.

Script is not tracking closed issues at all (closed issues counter will just constantly grow - I don't see how it could be useful)............... WAIT!! I just realized I can track how many issues was closed TODAY - and this is definitely is usefull

Script is not tracking Duplicated issues because before we realized that issue is Duplicated - we can't count it as duplicated, after we realized that - we close the issue and it is nothing to track then. - Will count duplicate issues in closed as well since now it has sense

Updating the script to count closed issues...

zelima commented 6 years ago

@AcckiyGerman

If people replace the severity label, next day script will count - 1 for previous label and + 1 for current.

Generally, I have a feeling that this lacks analysis. I would start with some more detailed user stories

Also please update Analysis section instead of writing as a comment

AcckiyGerman commented 6 years ago

@zelima analysis is updated. Also, I have a feeling that I spent too much time on this task. Actually I like it, but we have priority to close other tasks, and do tests and @akariv asked us not to spend many efforts on this one.

Task is 95% ready, so I propose finish it in the way I started and then extend/rewrite the script if we are not satisfied with the result.

zelima commented 6 years ago

@AcckiyGerman Agree

AcckiyGerman commented 6 years ago

https://datahub.io/AcckiyGerman/datahub-qa-issues-tracker

Dataset is ready and have fancy views.

Now it needs to be executed daily....

AcckiyGerman commented 6 years ago

FIXED: https://datahub.io/examples/datahub-qa-issues-tracker