Closed SimonLab closed 6 years ago
I think we still need to clarify the structure of the tables especially the revisions
table(s) (we mentioned this on this comment: https://github.com/dwyl/github-backup/issues/5#issuecomment-367639710
Github is doing a distinction between the "first comment" which is the issue description and the rest of the comments:
Github give a unique id to the issue and to the comments: issue:
comments (just one here):
Ideally we would have to create only one revision
table which save all the history of edits on the issue (first comment) and the comments. We would have use the id of the issue or comments to reference the revisions to the correct comments. The problem is that I think the ids are not unique between issues and comments.
So I still think that we might need to create two revisions
tables one for the edits on the issue (first comment) and one for the edits on the comments themselves.
Here is a the flow:
Issue is created -> save meta data in the issue table (id, title html_url...) -> create new revision in the revision_issue
with (id, id_issue, timestamp) -> create file on S3 with {id_issue}.json which will have the folllowing structure:
{
"issue": {"revision_1": "description issue", "revision_2": "description issue edited",...},
"comments": {"revision_1": "comment", "revision_2": "comment edited",...}
}
Issue edited -> create new revision_issue
item -> update the s3 file by adding updating the object linked to the "issue" key
Comment added or edited: add a new entry into the revision_comment
table -> update the s3 file to update the object on the "comments" key
We can see that the postgres database will keep track of the "timeline of the edit" and the s3 file will only contains the edit and changes of the issue and comments.
@nelsonic @Cleop does this makes sense? So to recap I think I still prefer to create 2 different tables to store the changes of the issue and comments due to the way Github split the two on the api. But if you think another way can work or it's easier to impement let me know :+1:
@SimonLab that is reasonable ... 🤔
That's odd, I thought each comment (including the issue description) had a totally unique ID. If this is the case then I reckon this is a good way forward for now.
Easy to check ...
My original comment (without edit:
https://github.com/dwyl/github-backup/issues/37#issuecomment-369579067
Added Emoji:
https://github.com/dwyl/github-backup/issues/37#issuecomment-369579067
Still has the same comment id
... @SimonLab perhaps we could include a timestamp
in the id we store e.g:
{
"comments": {
"369579067_1519907431": "@SimonLab that is reasonable",
"369579067_1519908529": "@SimonLab that is reasonable ... 🤔"
}
}
@iteles from my previous screenshots we can see that the id of the comment 369565032 > 301369661 the id of the issue and that they have the same format it's difficult to know that the ids are unique and don't overlap. Github might check that the id is not already taken by an issue when a new comment is created but I couldn't find any information about the uniqueness of the ids. That's why I think the issue id might happend to be the same as a comment id on some occasion.
@nelsonic we could try to make the issue and comment ids unique by adding a timestamp but I'm not sure how it can help us.
I think at this stage I'll try to create the two revision
tables and while I implement this solution I might find a better way to structure the database
Hang on I think I'm confused...
Maybe ignore this comment---- Or what I had envisaged in my schema (https://github.com/dwyl/github-backup/issues/5#issuecomment-367417914) was:
If we had these two values then I think everything could be done under one 'comments' table
@SimonLab I have asked the question about comment id
uniqueness on StackOverflow:
https://stackoverflow.com/questions/49052928/are-github-issue-comment-ids-unique
For now I think it's "safe" to assume that issue id
and comment id
are an auto-increment INT in GitHub's MySQL DB. i.e. unique.
(I would "bet a million bucks" that issue.id
and comment.id
are unique i.e. never re-used)
however this should not matter to us because we are not going to make our issues.id
or comments.is
an auto-increment field.
To be clear I was suggesting using a timestamp
in our JSON key (above)
because the comment id
does not change when someone edits
their comment.
e.g: my comment above has an id
of 369579067
regardless of how many times I edit
it ...
therefore we should store it as 369579067_{timestamp}
to easily keep track of the revisions.
This will ensure that the older revisions are have a lower (timestamp) number and can be sorted accordingly.
So...
But wait, hold on, github doesn't give the first comment of an issue a 'comment id'? In these cases we assign the ID 'issue id_1' so we know it's the first comment of that issue. In all other cases we use the github generated comment id.
Finally we want to save the copy/text of the description into a json file that has a name and structure that makes it easy to access whenever we want it. At the moment we've discussed structuring our json files by issue so that each one contains all of the comments (and all of their respective versions) relating to one issue within one file. Our json file would look like this:
This would be our basic structure. If we then wanted to organise our versions of text by issue or by comment (to display on the front end) then we could do so using the version ID and then querying our DB based on the other fields that were particular to what it is we were trying to achieve.
I hope this makes some sense! It's only covering this specific bit of flow so we can talk more about user tables etc in more detail on the call or at a later date based on priority. We can also talk through this flow here in more detail on the stand up tomorrow as sometimes spoken word is easier to understand.
This issue, #15 and #5 are all linked and crossing over. I wouldn't say they're 100% duplicates but I think it's a bit confusing having the schema question in 3 places. Can we consolidate it to one issue? @SimonLab and @nelsonic - what do you think?
@Cleop agreed. 👍
As a user I want to save new created issue So that I can access a backup version of the Github issues.
linked to #31 (How to save a json file on S3 with Phoenix) and #5 (What kind of issue information we want to backup?)
Steps
[x] Catch the new issue webhook event:
:issue_created
type of event, see: https://github.com/dwyl/github-backup/blob/73cafc01a5ff915a39bfeebfe986d7e0f25f5d2a/lib/app_web/controllers/event_type.ex#L30 https://github.com/dwyl/github-backup/blob/73cafc01a5ff915a39bfeebfe986d7e0f25f5d2a/lib/app_web/controllers/event_controller.ex#L12[x] Create a simple version of the
issue
,comment
andrevision
schemas and tables: issue: id (unique on Github), number, html_url comment: id, issue_id, revisions: id, id_commentThe user schema/table and the reference to the user on the revision table can be added later on
[x] Save the {id_isssue}.json file on S3 with the following structure: {revision_id: "content of the comment"},...