adobe / helix-run-query

service that executes queries on BigQuery datasets generated by Helix-Logging
Apache License 2.0
6 stars 11 forks source link

[Domain-list] hostname in response is not unique #1120 #1121

Closed lydiapuric closed 5 months ago

lydiapuric commented 5 months ago

Please ensure your pull request adheres to the following guidelines:

Related Issues

[Domain-list] hostname in response is not unique #1120

Thanks for reviewing!

lydiapuric commented 5 months ago

@langswei Just checked, thx. You are right, the DISTINCT in earlier place in line 49 is the right place to be, adjusting.

github-actions[bot] commented 5 months ago

This PR will trigger no release when merged.

langswei commented 5 months ago

@lydiapuric I still see duplicates, but at least it's faster. Perhaps the answer is to have DISTINCT in both places. I will also speculate that this problem will go away over time -- earlier records did not have p###-e###, newer records do. Eventually the old ones will fall out of the query, but not until 2025 sometime so it's probably worth fixing.

lydiapuric commented 5 months ago

@langswei, I don't see duplicates. I made a count by hostname at the end. It looks good to me. and regarding records which did not have p###-e###, newer records do: in line 54 restrict to pattern and in line 97 i do only a join at hostname. so even if prior records have not the pattern, they will get assigned to the new cs values.

trieloff commented 5 months ago

:tada: This PR is included in version 3.28.1 :tada:

The release is available on:

Your semantic-release bot :package::rocket: