Closed lgvalle closed 3 years ago
Thanks for the reply @alstr, very valid points. Lets break them down:
commit/
vs compare/
I'm not sure if I'm looking in the right place, but when I compare the responses from these two calls it seems that in both cases the content of file
is a dict. Am I missing something here?
https://api.github.com/repos/alstr/todo-to-issue-action/commits/3fd212fda154273498916d48fdad504a09c6f67d
vs
https://api.github.com/repos/alstr/todo-to-issue-action/compare/63126d70260dc962111eb9b2293b0606971b0ed5...3fd212fda154273498916d48fdad504a09c6f67d
patch
key contentSame for the patch
content: in both cases, I can see the same content for patch
"patch": "@@ -269,7 +269,7 @@ class TodoParser(object):\n LINE_NUMBERS_INNER_PATTERN = re.compile(r'@@[\\d\\s,\\-+]*\\s@@')\n ADDITION_PATTERN = re.compile(r'(?<=^\\+).*')\n DELETION_PATTERN = re.compile(r'(?<=^-).*')\n- IDENTIFIER_PATTERN = re.compile(r'.+?(?=\\))')\n+ REF_PATTERN = re.compile(r'.+?(?=\\))')\n LABELS_PATTERN = re.compile(r'(?<=labels:\\s).+')\n ASSIGNEES_PATTERN = re.compile(r'(?<=assignees:\\s).+')\n MILESTONE_PATTERN = re.compile(r'(?<=milestone:\\s).+')\n@@ -446,10 +446,10 @@ def _extract_issue_if_exists(self, comment, marker, code_block):\n for line in lines:\n line_status, committed_line = self._get_line_status(line)\n cleaned_line = self._clean_line(committed_line, marker)\n- line_title, identifier = self._get_title(cleaned_line)\n+ line_title, ref = self._get_title(cleaned_line)\n if line_title:\n- if identifier:\n- issue_title = f'[{identifier}] {line_title}'\n+ if ref:\n+ issue_title = f'[{ref}] {line_title}'\n else:\n issue_title = line_title\n issue = Issue(\n@@ -527,18 +527,23 @@ def _clean_line(comment, marker):\n return comment.strip()\n \n def _get_title(self, comment):\n- \"\"\"Check the passed comment for a new issue title (and identifier, if specified).\"\"\"\n+ \"\"\"Check the passed comment for a new issue title (and reference, if specified).\"\"\"\n title = None\n- identifier = None\n- title_pattern = re.compile(r'(?<=' + self.identifier + r'[(\\s:]).+')\n+ ref = None\n+ title_pattern = re.compile(r'(?<=' + self.identifier + r'[\\s:]).+')\n title_search = title_pattern.search(comment, re.IGNORECASE)\n if title_search:\n title = title_search.group(0).strip()\n- identifier_search = self.IDENTIFIER_PATTERN.search(title)\n- if identifier_search:\n- identifier = identifier_search.group(0)\n- title = title.replace(identifier, '', 1).lstrip(':) ')\n- return title, identifier\n+ else:\n+ title_ref_pattern = re.compile(r'(?<=' + self.identifier + r'\\().+')\n+ title_ref_search = title_ref_pattern.search(comment, re.IGNORECASE)\n+ if title_ref_search:\n+ title = title_ref_search.group(0).strip()\n+ ref_search = self.REF_PATTERN.search(title)\n+ if ref_search:\n+ ref = ref_search.group(0)\n+ title = title.replace(ref, '', 1).lstrip(':) ')\n+ return title, ref\n \n def _get_labels(self, comment):\n \"\"\"Check the passed comment for issue labels.\"\"\""
filename
key instead of parsing it from the diffSeems like the right approach. I'll see if I can add it to this PR 👍
commits/
response is paginatedThis is a tricky one and I don't have a solution for it. If there are more than 300 files it'll paginate the response. Given this is a fallback for case where the action would fail anyways, if we move forward with this I'd suggest open a separate issue to deal with pagination
if os.getenv('INPUT_BEFORE') != '0000000000000000000000000000000000000000':
Yep, should be done in this PR too
Thanks for the reply @alstr, very valid points. Lets break them down:
1. Endpoint format:
commit/
vscompare/
I'm not sure if I'm looking in the right place, but when I compare the responses from these two calls it seems that in both cases the content of
file
is a dict. Am I missing something here? https://api.github.com/repos/alstr/todo-to-issue-action/commits/3fd212fda154273498916d48fdad504a09c6f67d vs https://api.github.com/repos/alstr/todo-to-issue-action/compare/63126d70260dc962111eb9b2293b0606971b0ed5...3fd212fda154273498916d48fdad504a09c6f67d
The difference is setting the header Accept: application/vnd.github.v3.diff
when sending a request to the compare/
endpoint returns the raw diff file, which is what currently happens.
If you don't set that header it returns the dict as specified in the documentation.
The difference is setting the header Accept: application/vnd.github.v3.diff when sending a request to the compare/ endpoint returns the raw diff file, which is what currently happens.
If you don't set that header it returns the dict as specified in the documentation.
I've been thinking about this. If you set the application/vnd.github.v3.diff
header, both requests return the same diff formatted response. So you can still extract the filename with the same regex in both cases:
curl --request GET \
--url https://api.github.com/repos/alstr/todo-to-issue-action/commits/3fd212fda154273498916d48fdad504a09c6f67d \
--header 'Accept: application/vnd.github.v3.diff'
curl --request GET \
--url https://api.github.com/repos/alstr/todo-to-issue-action/compare/63126d70260dc962111eb9b2293b0606971b0ed5...3fd212fda154273498916d48fdad504a09c6f67d \
--header 'Accept: application/vnd.github.v3.diff'
It's not clear to me if what you are suggesting is to not send the header in any case from now on and deal with the json output. That would make it simpler to read the filename (it'll be a separate key from patch). It'll also make it easier to scale the action if there's interest in reading other keys in the future. But it's a big change.
If you want to go down this road I'll suggest doing that in a separate PR that only address the change of the parsing strategy without introducing any behaviour changes
I've been thinking about this. If you set the
application/vnd.github.v3.diff
header, both requests return the same diff formatted response.
Oh that's a cool find. I was not aware of that (doesn't help that it's not mentioned in the documentation) and it definitely makes things simpler. In theory, I can't see any reason why it shouldn't work. I would also assume that by retrieving the diff file directly the response would no longer be paginated. Seeing as the action only ever compares against the previous commit, we could maybe just swap out compare/
for commit/
completely?
Yes, entirely replace compare/
with commit/
makes sense, as the action only cares about the last commit which is precisely what commit/
returns.
I've updated the PR to account for this.
There's still an issue that can happen depending on how the action is used. Let's take a branch with several commits and a TODO comment only on the first commit. If the action runs on push, this TODO comment on the branch first commit won't be detected. (It'll work fine if the action runs on merge for example, as the merge commit will contain everything)
Yes, that's a bit of a problem. It happens even when not on a separate branch; if I make three commits each with a TODO and push them together, only the one in the last commit will be added.
Here is a potential way of handling it:
000...
Perform the normal compare/
request, as that will pick up all TODOs across all commits.
000..
Use the compare/
endpoint (without application/vnd.github.v3.diff
) to get the list of commits and cycle through them all, getting the diff for each by calling commits/
.
In theory a TODO could be added in the second commit and removed in the third commit, which could open then close an issue unnecessarily, but the action shouldn't process them if they appear as both an addition and deletion.
It sounds a bit cumbersome but perhaps one way around it.
If the before SHA is not
000...
Perform the normal compare/ request, as that will pick up all TODOs across all commits.
I think this won't work either. The problem, in this case, is determining the SHA to compare with. github.event.before
only gives you the previous commit SHA which makes the use of compare/
and commits/
in this action virtually the same.
(That's the reason we decided to move everything to commits/
with or without previous SHA)
To be able to deal with the case of three commits pushed together with a new TODO comment on each the action will need to, somehow, store the last analysed commit and compare with that.
Another alternative I can see is tweaking the action a bit to always run on Pull Requests so it can always compare against the base branch and get the full list of changes. But that's a big change out of this PR scope I'd say.
If the before SHA is not
000...
Perform the normal compare/ request, as that will pick up all TODOs across all commits.
I think this won't work either. The problem, in this case, is determining the SHA to compare with.
github.event.before
only gives you the previous commit SHA which makes the use ofcompare/
andcommits/
in this action virtually the same. (That's the reason we decided to move everything tocommits/
with or without previous SHA)
My understanding was that if I have 3 commits pushed at the same time:
abcd123 - 1 TODO added
efgh456 - 1 TODO added
ijkl789 - 1 TODO added
If we call commits/
:
Any TODOs added in the first or second commit wouldn't be picked up, because it only shows the diff between efgh456
and ijkl789
. Only 1 issue would be created.
If we call compare/
:
The diff shows the TODOs added in all three commits. 3 issues would be created.
That is what my testing seems to show. Therefore if the ref is not 0000000
, we can use compare/
to get all TODOs?
To be able to deal with the case of three commits pushed together with a new TODO comment on each the action will need to, somehow, store the last analysed commit and compare with that.
Another option could be having a basic text file and writing the last commit ref to it whenever the action runs? 🤔
I repeated your test in a separated forked repo and it worked, which confused me a lot 🙈 So I've been investigating the reason because it didn't make sense to me why would it work if comparing with the previous commit and using just commits/
should do the same thing.
The reason why it works is in what github.event.before
means. I though it would point to the previous commit in the branch, but it actually refers to the previous commit pushed to origin.
That means we don't need to save the last commit anywhere, Github does it for you 😃
The consequence of all of this is that I should revert the lastest changes to this PR and leave the original approach:
000...
then use commits/
compare/
In any case, remove the check for 000...
so the action always runs.
Do you see any potential issues with it?
The criteria for everything being okay is:
Scenario | Works? | Endpoint |
---|---|---|
Action retrieves TODOs when a single commit is pushed and the SHA is not 0000000 |
✅ | compare/ |
Action retrieves TODOs when a single commit is pushed and the SHA is 0000000 |
✅ | commits/ |
Action retrieves TODOs when two or more commits are pushed, the first of which does not have a SHA of 0000000 |
✅ | compare/ |
Action retrieves TODOs when two or more commits are pushed, the first of which has a SHA of 0000000 |
❓ | ❓ |
In the last scenario, If we use commits/
it will obviously just get the last commit, which we don't want, but if we use compare/
, will the diff include the contents of the first commit?
Action retrieves TODOs when two or more commits are pushed, the first of which has a SHA of 0000000
The problem in this last scenario is we don't have a commit SHA to compare with:
compare/00000...193a6e
fails because 00000
is not a valid commit SHAcommits/
only give us the last commit, as you said. So none of the options is 100% good enough for this scenario.
For this last case to work we'll need to find out the last commit of the source branch of the current branch. And compare against that.
This should be feasible because you can use compare/
between SHAs and branch names. For example:
This is comparing a branch
debug-branch-three
with a commit SHA from another branch.
What I don't know is how to obtain the base branch for an existing one 😕
I've been doing some further investigation and it looks like we can pass details of all commits in a push event by including the commits
array from the push
event payload.
If we add the following to action.yml
:
COMMITS:
description: "Some commits"
required: true
default: "${{ toJSON(github.event.commits) }}"
We can then retrieve them in the action with os.getenv('INPUT_COMMITS')
. These are only the commits in this push, which is very useful.
I just created a new repo, made 5 commits, then pushed them all. I was able to see details for all of those commits, so it would then be a matter:
0000000
? If so, iterate through each and get the diff. If not, just do compare/
.I did read something about it only including the last 20 commits and 3 tags pushed, but I think beyond that is getting too complicated anyway. 😆
@alstr using commits
array is a great idea! I've updated the PR to make use of this for the last case. We can break it down into three cases now:
before
SHA is not empty -> compare/
before
SHA is empty and there is only one commit in the branch -> commits/
before
SHA is empty and there are multiple commits in the branch -> find the oldest commit in the branch and use compare/
I've updated the PR to consider these cases:
if self.before != '0000000000000000000000000000000000000000':
# There is a valid before SHA to compare with
diff_url = f'{self.repos_url}{self.repo}/compare/{self.before}...{self.sha}'
elif len(self.commits) == 1:
# There is only one commit
diff_url = f'{self.repos_url}{self.repo}/commits/{self.sha}'
else:
# There are several commits: compare with the oldest one
oldest = sorted(self.commits, key=self.get_timestamp)[0]['id']
diff_url = f'{self.repos_url}{self.repo}/compare/{oldest}...{self.sha}'
My internal tests seem to confirm this work in all three cases but please let me know what do you think
This looks great, thanks for that. It's actually simpler than I was anticipating; in the case of multiple commits pushed to a new repo, I was anticipating having to call commits/
on each SHA, but yes, getting the first SHA and last SHA and running compare/
is definitely better if that works!
I just added a check to make sure that self.commits != 0
. The default workflow makes the action run on release as well. It doesn't do anything, but there are no commits to retrieve.
This PR identifies the situation where the action is running on a branch without a previous commit and provides an alternative URL to calculate the diff
This situation can happen, for example, on feature branches that only contain a single commit when the action is setup to be triggered on pull requests.
This might also be a fix for issue https://github.com/alstr/todo-to-issue-action/issues/63