IBM / page-lab

PageLab enables web performance, accessibility, SEO, etc testing at scale.
Apache License 2.0
19 stars 10 forks source link

View analytics reports for subset of all pages (ability to filter). #28

Closed rcalfredson closed 5 years ago

rcalfredson commented 5 years ago

Currently, analytics reports aggregate result from all pages. In addition to this existing functionality, users may want a way to specify a subset they would like to analyze.

There could be a variety of ways to specify the desired subset of pages. For example, perhaps users would like to filter down to URLs containing a given subset, or perhaps they would like to input a comma-separated list of URLs they know in advance.

It would be possible for the user to view the dashboard, the report-card tiles and the detailed page summaries using that filtered list.

daviddahl commented 5 years ago

First off , we have wanted to build this kind of functionality for some time - so we have had some concepts in mind for a while...

I have looked at the PRs and I think we should discuss your proposal. We have had many ideas on this and I think we can - with very little effort add a UrlFilter model where we can describe and name each filter that you would like to create. This would require some changes to the Url model as well: where we can break up the url into its constituent parts and do much more efficient queries. We have 7600 urls in the database now, and are expecting the number of urls in our db to exceed 500k, in which case (and perhaps even now), an icontains= filter is very inefficient and slow.

I imagine there would also be a UrlFilterRule model that binds the part of the URL you want to filter by to the UrlFilter. For instance: [hostname equals foo.ibm.com] and [pathname0 equals marketplace]

I'm happy to create the models I have in mind here if you like.

Feel free to follow up with a meeting to discuss with myslef and @ecumike

rcalfredson commented 5 years ago

Thank you for the feedback! Have just seen the message, and think this makes sense. For a chance to meet, there is one meeting that Ann Marie Fred, Tim Figgins, and I are planning to have today at 10:00 AM to talk about filter functionality at large. Do you think that time would be good for us to discuss also? I apologize for the short notice on that, and of course we could meet at a different time if it would work better

Before that time, I will begin to experiment with model-based approach, and hopefully would have something toward that goal to share by then. Thanks again.

amfred commented 5 years ago

Some of this is internal implementation details, but stepping back from a usability perspective, a simple substring match is the easiest for people to understand. However, we have a business requirement coming up next (as in this week) to also be able to filter the results by business unit and geography, so keep that in mind, please.

For example:

The trick is in figuring out how the framework can allow for that type of filtering, without it getting too specific to our company internals.

rcalfredson commented 5 years ago

Would like please to share an initial "gist" or sketch of new models based on comment above. Are these similar to the approach you are thinking about?

class UrlFilter(models.Model):
    """
    A Url filter to apply.
    """
    key = models.CharField(max_length=255, choices=[('host_name', 'host_name'), ('pathname0', 'pathname0')])

class UrlFilterRule(models.Model):
    """
    Specific instance of URL filter.
    """
    match = models.CharField(max_length=255, null=True)
    filter = models.ForeignKey(UrlFilter, on_delete=models.CASCADE)
    def getMatches(self):
        return [obj for obj in Url.objects.all() if getattr(obj, self.filter.key) == self.match]

Have added an initial one as a test, using some new properties of Url. My inference is that the changes to Url would take place on the DB level, rather than on the runtime level via derived properties; do you think this is accurate? Thank you.

rcalfredson commented 5 years ago

For example, added a temporary property that is pathname0 inside Url class, defined thus:

@property
def pathname0(self):
    return self.url.split('://')[1].split('/')[1]

Please note that this hard-coded way of setting these pathname properties is only for proof-of-concept (i.e. that it would/could be replaced by some type of iterator).

daviddahl commented 5 years ago

Let me create a PR later today / tomorrow to illustrate. I think the filtering data should be persisted

rcalfredson commented 5 years ago

That sounds good, thank you! Would like please to confirm, it is correct that this would be replacement for, not a companion to, the current proposal? In other words, are there parts of the existing P.R.'s we would want to retain/use, or should they be closed instead?

daviddahl commented 5 years ago

I'll make the models and we can then meet to discuss

@rcalfredson I have something working now in #36 if you want to schedule a technical discussion with me, @ecumike and yourself.

daviddahl commented 5 years ago

These screens should give you an idea on the concept I am working on selection_163 selection_162

There is also a model for the key val pairs in the location.search, but it is not working yet. PR will be up tomorrow

rcalfredson commented 5 years ago

Yes, thank you, that sounds good! Have already been working with the code, and the approach is clear to me. Would you want to meet for it today? I have complete availability.

daviddahl commented 5 years ago

I wanted to mention that the service and UI of this new functionality should look like this:

a view is created to list all of the UrlFilter objects at /url-filters/

Each filter corresponds to a url and view that produces the dashboard at the UrlFilter's name: /url-filters/my-url-filter-for-such-and-such

we want to use slugs for the names and make them unique

In the UrlFilter view function we have a very simple operation:

myUrls = UrlFilter.objects.get(name='my-url-filter-name').run_query()

myUrls are then consumed by the dashboard generating code.

The dashboard generation code may need to be pulled into a helper function

ecumike commented 5 years ago

Closing. In master as of PR #46

amfred commented 5 years ago

Agreed, the filters are working. Looking forward to the "AND" and "OR" functionality next!