dwyl / hits

:chart_with_upwards_trend: General purpose hits (page views) counter
http://hits.dwyl.com
GNU General Public License v2.0
433 stars 63 forks source link

Only display unique hits in badge #61

Open christian-draeger opened 5 years ago

christian-draeger commented 5 years ago

Hey, i'm wondering if it's possible to configure the badge to only show unique hits (over a day or specific period of time). This option would give a more meaningful hint to repo owners that are having the "hits-badge" added to their readme files.

As I saw you already storing the hit data including a hash (consisting of user-agent, IP and timestamp) which in turn means you already know if a hit is unique or not - why not adding this option to the badge? :)

for instance something like http://hits.dwyl.io/skrapeit/skrape.it.svg?uniquePerDay would be handy.

i think what bothers me the most is that every time you adjust your readme (and have the hit-badge included in the readme) IntelliJ's preview mode for md files will render the readme. This causes every keystroke to increment the hit count. Which leads to the issue that the hit-count is extremely falsified and no longer has any significance to the repo owner. here is a screen record that describes the problem: hit-count-issue-intellij

nelsonic commented 2 years ago

@SimonLab Do you think you could take a look at this? It should just be a matter of updating the query to be DISTINCT. Watch out for this gotcha: https://stackoverflow.com/questions/11250253/postgresql-countdistinct-very-slow And if we can do it using idiomatic Ecto (without returning all the data to the Phoenix App and wasting bandwidth) it would be even better!

SimonLab commented 2 years ago

@nelsonic Looking at this now At the moment the count value is returned when a hit is inserted. So we'll need to update the aggregate function: https://github.com/dwyl/hits/blob/b60b329b4096ac690af00cc3391399a7ede79916/lib/hits/hit.ex#L24-L30

I'm thinking of adding a filter query parameters to the url (as mention on the first comment): ...svg?filter=day

nelsonic commented 2 years ago

@SimonLab good plan. The only thing to note is at present the Ecto Repo.aggregate query is somehow being ultra-inefficient and thus consuming a ton of bandwidth; see: https://github.com/dwyl/learn-devops/issues/72#issuecomment-958836871 Basically we might need to write a "Raw SQL" query to avoid sending all the data from PostgreSQL to Phoenix each time ... 💭

nelsonic commented 2 years ago

We really want to streamline the query to do the COUNT DISTINCT on the PostgreSQL server to only return the int value rather than all the data. 🤞

SimonLab commented 2 years ago

I think the aggregate Ecto query is already using the count distinct and only the count is sent back to Phoenix. This is the generated query:

SELECT count(s0."id") FROM (SELECT DISTINCT ON (sh0."useragent_id") sh0."id" AS "id" FROM "hits" AS sh0 WHERE (sh0."repo_id" = $1)) AS s0 [1]
nelsonic commented 2 years ago

@SimonLab cool. thanks for confirming. 👍