drdhaval2785 / github_issue_backup

a script to backup github issues
0 stars 0 forks source link

Download only updates - not everything again #36

Closed drdhaval2785 closed 8 years ago

drdhaval2785 commented 8 years ago

35 tried to make out a point for images.

The same holds true for issues and comments. If the parent directory has issues and comments which are updated_at time older than the last download, the issue / comment need not be downloaded. some github API must be existing to find this out.

In nutshell, only incremental changes need to be downloaded to make the machine faster and less data consuming.

funderburkjim commented 8 years ago

From this Github article,
I discovered there is a way to filter the issues manually. This is likely implemented in the API, so may provide a clue as to how to solve this 'need not be downloaded' issue.

Here is an example showing how to filter for issues in the CORRECTIONS repository that have been updated this month of November, 2015.

is:issue updated:>=2015-11-01 
[NOTE: 9 issues were selected]

image

Here is a link to the Github search api.

The way this might work in your issue backup would be for the user to supply a 'since' date (in the yyyy-mm-dd format). Then. (asuming the user supplied a 'since' date'), for each repository in question, a preliminary search for issues updated since yyyy-mm-dd could be done, and this list could then guide the backup process.

It's likely that the situation is more complicated than the pseudo-code of the last paragraph indicates, but this might provide a starting point.

drdhaval2785 commented 8 years ago

Right now we are storing a timestamp when we successfully complete downloading a repository in timelog.txt file. It is handled in append mode (and not in overwrite mode).

  1. In the first phase, the list of issues is downloaded in issues.txt.
  2. If the latest updated_at time of any issue is later than the latest time for the particular repository in timelog.txt, the updated issues are downloaded.

This can be bettered if the conditional statements of github API can be fully understood and we download only the issues updated after a given timestamp.

drdhaval2785 commented 8 years ago

https://github.com/drdhaval2785/github_issue_backup/commit/c0a0d96edb959701578ca2e07afe1d49827980af closed this