Open wknapik opened 4 days ago
Thanks for opening this up, @wknapik, that is some weird behavior! 🤔 I'd like to share my theory and propose a few alternatives for reproducing this.
Essentially, I'm not sure wc -l
or sort -u | wc -l
are appropriate ways to manipulate and understand this data. It would help to preserve the output in files that we can compare and understand why they are differing.
Here are some alterations I like to propose understanding what's going on:
Let's pick a concrete date to look for merges
Let's base this on the date you opened this issue: 2024-09-14
Let's consider preserving the output from the command into a file so we can compare how they are different
Let's use jq
to count the length of the arrays rather than wc -l
_We have to use jq
rather than --jq
if we want to preserve the output, should only require: ... | jq '. | length'
The result looks something like this:
# We should be able to see deviation in GitHub API responses after 10 attempts with the same arguments
for i in {1..10}; do \
echo "Attempt $i";
gh search prs --limit 256 --merged --merged-at ">2024-09-14" --base master --repo brave/brave-core --json number > "cli-9749-$i.json";
done
# This is a more accurate way to look purely at the number of items
find . -name "cli-9749-*.json" -print -exec jq '. | length' {} \;
for i in {2..10}; do \
diff <(jq 'sort' cli-9749-1.json) <(jq 'sort' "cli-9749-$i.json");
done
In theory, this will check the number of results and the actual contents.
10 attempts all consistent both number of results and actual contents
andyfeller@Andys-MBP:~ $ gh version
gh version 2.58.0 (2024-10-01)
https://github.com/cli/cli/releases/tag/v2.58.0
andyfeller@Andys-MBP:~ $ for i in {1..10}; do \
echo "Attempt $i";
gh search prs --limit 256 --merged --merged-at ">2024-09-14" --base master --repo brave/brave-core --json number > "cli-9749-$i.json";
done
Attempt 1
Attempt 2
Attempt 3
Attempt 4
Attempt 5
Attempt 6
Attempt 7
Attempt 8
Attempt 9
Attempt 10
andyfeller@Andys-MBP:~$ find . -name "cli-9749-*.json" -print -exec jq '. | length' {} \;
./cli-9749-2.json
256
./cli-9749-3.json
256
./cli-9749-8.json
256
./cli-9749-4.json
256
./cli-9749-5.json
256
./cli-9749-10.json
256
./cli-9749-9.json
256
./cli-9749-6.json
256
./cli-9749-7.json
256
./cli-9749-1.json
256
andyfeller@Andys-MBP:~ $ for i in {2..10}; do \
diff <(jq 'sort' cli-9749-1.json) <(jq 'sort' "cli-9749-$i.json");
done
andyfeller@Andys-MBP:~ $
I don't believe gh search
caches results of these calls, however it should be easy enough to add gh config clear-cache
between gh search prs
calls to ensure it clears any potential cache.
@wknapik : I'd like you to attempt reproducing this problem while preserving the results from the gh search prs
commands and sharing your thoughts on where you are seeing variants.
Hi @andyfeller. Your code does not appear to check for duplicates, only for consistency. When I run it, I get the same results you do. There's 256 results, including duplicates, identical every time, not affected by running gh config clear-cache
.
I think we might be misunderstanding each other.
The point is that there are duplicate results. There shouldn't be any, regardless of what value is passed to --limit
.
The commands I shared are meant to illustrate that for the --limit
value of 256, I got 56 duplicates, for 275 I got 50 duplicates and for 300 there were no duplicates.
It's also interesting to note that up to a point, the duplicates are added until the total number of results returned is equal to the value of --limit
. Past that point, there are no duplicates anymore, regardless of the --limit
value.
cli-9749-1.json ends with
{"number":25808},{"number":25806},{"number":25805},{"number":25804},{"number":25802},{"number":25799},{"number":25798},{"number":25797},{"number":25796},{"number":25795},{"number":25794},{"number":25793},{"number":25792},{"number":25788},{"number":25785},{"number":25784},{"number":25783},{"number":25780},{"number":25779},{"number":25777},{"number":25776},{"number":25775},{"number":25774},{"number":25771},{"number":25770},{"number":25769},{"number":25768},{"number":25767},{"number":25764},{"number":25761},{"number":25760},{"number":25759},{"number":25758},{"number":25757},{"number":25752},{"number":25750},{"number":25749},{"number":25748},{"number":25747},{"number":25744},{"number":25743},{"number":25740},{"number":25738},{"number":25737},{"number":25736},{"number":25735},{"number":25734},{"number":25731},{"number":25730},{"number":25726},{"number":25721},{"number":25720},{"number":25719},{"number":25718},{"number":25717},{"number":25716}
which is a repetition of an identical sequence earlier in the file - those are the 56 duplicates