clojars / clojars-web

A community repository for open-source Clojure libraries
https://clojars.org
Eclipse Public License 1.0
468 stars 114 forks source link

Download stats files include spurious ?prefix= entries #867

Closed tobias closed 1 year ago

tobias commented 1 year ago

I believe this is from the log file processing not understanding the new index generation requests.

We should:

There are also stats from 2020 that have ?marker= entries. I'm not sure what that is about, but we should try to generate those files if we can as well.

See https://gist.github.com/minikomi/9e7a54fe049dde9949766a913fa118bd for a full list (as of this report; the log parsing will continue to create ?prefix= entries until we fix this).

Thanks to @minikomi for the report!

tobias commented 1 year ago

This wasn't from the new repo listing, but is from some clients using the fastly CDN that we proxy to s3 to send s3 listing requests, which use marker and prefix parameters. The regex was too permissive, so would see some of those as downloads that should be counted.

The fix for this has been deployed, so we shouldn't see any new ones. I'll pull down the bad files and rewrite them.

tobias commented 1 year ago

The stats files have been fixed. Any entry that has a group that starts with ? has been removed. These changes may not be immediately available, as the old files may be cached by the CDN, but the cached files should expire over the next 24 hours or so.