dwyl / github-backup

:octocat: :back: 🆙 Backup your GitHub Issues so you can still work when (they/you are) offline.
https://github-backup.herokuapp.com
GNU General Public License v2.0
32 stars 3 forks source link

Backup new created issue #37

Closed SimonLab closed 6 years ago

SimonLab commented 6 years ago

As a user I want to save new created issue So that I can access a backup version of the Github issues.

linked to #31 (How to save a json file on S3 with Phoenix) and #5 (What kind of issue information we want to backup?)

Steps

SimonLab commented 6 years ago

I think we still need to clarify the structure of the tables especially the revisions table(s) (we mentioned this on this comment: https://github.com/dwyl/github-backup/issues/5#issuecomment-367639710

Github is doing a distinction between the "first comment" which is the issue description and the rest of the comments: github_comments

Github give a unique id to the issue and to the comments: issue: image

comments (just one here): image

Ideally we would have to create only one revision table which save all the history of edits on the issue (first comment) and the comments. We would have use the id of the issue or comments to reference the revisions to the correct comments. The problem is that I think the ids are not unique between issues and comments.

So I still think that we might need to create two revisions tables one for the edits on the issue (first comment) and one for the edits on the comments themselves.

Here is a the flow:

We can see that the postgres database will keep track of the "timeline of the edit" and the s3 file will only contains the edit and changes of the issue and comments.

@nelsonic @Cleop does this makes sense? So to recap I think I still prefer to create 2 different tables to store the changes of the issue and comments due to the way Github split the two on the api. But if you think another way can work or it's easier to impement let me know :+1:

nelsonic commented 6 years ago

@SimonLab that is reasonable ... 🤔

iteles commented 6 years ago

That's odd, I thought each comment (including the issue description) had a totally unique ID. If this is the case then I reckon this is a good way forward for now.

nelsonic commented 6 years ago

Easy to check ... My original comment (without edit: https://github.com/dwyl/github-backup/issues/37#issuecomment-369579067 image Added Emoji: https://github.com/dwyl/github-backup/issues/37#issuecomment-369579067 image Still has the same comment id ... @SimonLab perhaps we could include a timestamp in the id we store e.g:


{
  "comments": {
    "369579067_1519907431": "@SimonLab that is reasonable",
    "369579067_1519908529": "@SimonLab that is reasonable ... 🤔"
  }
}
SimonLab commented 6 years ago

@iteles from my previous screenshots we can see that the id of the comment 369565032 > 301369661 the id of the issue and that they have the same format it's difficult to know that the ids are unique and don't overlap. Github might check that the id is not already taken by an issue when a new comment is created but I couldn't find any information about the uniqueness of the ids. That's why I think the issue id might happend to be the same as a comment id on some occasion.

@nelsonic we could try to make the issue and comment ids unique by adding a timestamp but I'm not sure how it can help us.

I think at this stage I'll try to create the two revision tables and while I implement this solution I might find a better way to structure the database

Cleop commented 6 years ago

Hang on I think I'm confused...

Maybe ignore this comment---- Or what I had envisaged in my schema (https://github.com/dwyl/github-backup/issues/5#issuecomment-367417914) was:

If we had these two values then I think everything could be done under one 'comments' table

nelsonic commented 6 years ago

@SimonLab I have asked the question about comment id uniqueness on StackOverflow: https://stackoverflow.com/questions/49052928/are-github-issue-comment-ids-unique

For now I think it's "safe" to assume that issue id and comment id are an auto-increment INT in GitHub's MySQL DB. i.e. unique. (I would "bet a million bucks" that issue.id and comment.id are unique i.e. never re-used) however this should not matter to us because we are not going to make our issues.id or comments.is an auto-increment field.

To be clear I was suggesting using a timestamp in our JSON key (above) because the comment id does not change when someone edits their comment. e.g: my comment above has an id of 369579067 regardless of how many times I edit it ... therefore we should store it as 369579067_{timestamp} to easily keep track of the revisions. This will ensure that the older revisions are have a lower (timestamp) number and can be sorted accordingly.

Cleop commented 6 years ago

So...

But wait, hold on, github doesn't give the first comment of an issue a 'comment id'? In these cases we assign the ID 'issue id_1' so we know it's the first comment of that issue. In all other cases we use the github generated comment id.

Finally we want to save the copy/text of the description into a json file that has a name and structure that makes it easy to access whenever we want it. At the moment we've discussed structuring our json files by issue so that each one contains all of the comments (and all of their respective versions) relating to one issue within one file. Our json file would look like this:

This would be our basic structure. If we then wanted to organise our versions of text by issue or by comment (to display on the front end) then we could do so using the version ID and then querying our DB based on the other fields that were particular to what it is we were trying to achieve.

I hope this makes some sense! It's only covering this specific bit of flow so we can talk more about user tables etc in more detail on the call or at a later date based on priority. We can also talk through this flow here in more detail on the stand up tomorrow as sometimes spoken word is easier to understand.

Cleop commented 6 years ago

This issue, #15 and #5 are all linked and crossing over. I wouldn't say they're 100% duplicates but I think it's a bit confusing having the schema question in 3 places. Can we consolidate it to one issue? @SimonLab and @nelsonic - what do you think?

nelsonic commented 6 years ago

@Cleop agreed. 👍