MuckRock / muckrock

MuckRock's source code - Please report bugs, issues and feature requests to info@muckrock.com
https://www.muckrock.com
GNU Affero General Public License v3.0
114 stars 22 forks source link

Save DocumentCloud Cloudflare Stats #1860

Open morisy opened 3 months ago

morisy commented 3 months ago

Cloudflare keeps a variety of interesting statistics, however it only maintains them for 30 days after which the information is purged. It's also not the most usable of interfaces

Image

We'd like to capture some data to help assist evaluating content to promote as well as impact of the site.

On the backend, I'd like to start storing each previous day's unique visitors.

Additionally, it would be good to start generating an email that includes the 25 most popular paths (configured so they have the full URL, not just the backend of it like in the currently display):

Image

over the previous 24 hours, along with the statistics.

The email could:

Subject Line: DocumentCloud Top Docs - May 6, 2024

  1. 2023-01699-F Responsive Documents 3 (Uploader Name, Organization) -- 435,604 Pageviews https://www.documentcloud.org/documents/24649101-2023-01699-f-responsive-documents-3

  2. Another-Document-Name (Uploader Name, Organization) -- 335,604 Pageviews https://www.documentcloud.org/documents/24649101-2023-01699-f-responsive-documents-3

etc etc. Depending on how much time pulling the document title, uploader name, etc takes, even just the URL and page views would be very helpful.

Others might have other wishlist items that would be good to have, so would be good to get them to weigh in on data that would be good to have, format, etc.

eyeseast commented 3 months ago

Maybe useful? https://developers.cloudflare.com/analytics/graphql-api/

duckduckgrayduck commented 1 month ago

This seems to be the GraphQL groups that should be queried to access the Web Analytics programmatically: https://community.cloudflare.com/t/downloading-web-analytics-data/473295 I think I've got a working graphql query to get some of the information we want, but it looks like graphql is paywalled: https://community.cloudflare.com/t/graphql-not-autorized-for-that-account/469748 I opened a ticket with Cloudflare to see if they would grant us free or reduced access to the resource. Otherwise, we may choose to not move forward with this

duckduckgrayduck commented 4 weeks ago

I followed up with Cloudflare again on Friday and haven't heard back after Christopher's initial response

duckduckgrayduck commented 2 weeks ago

My ticket has been switched over from Zendesk to Salesforce as they changed ticketing systems. Still no reply.