Filters block - Githubissues

jacomyal commented 2 years ago

One important choice must be made here, between reproducing Gephi's filters panel (with is very expert-oriented) for future interoperability, or only use a more user-friendly subset.

1. Gephi's model

Filters are represented in a logical tree. Each filter has a type. Each filter can have parameters (relatively to their type), and can have one or multiple children (again, relatively to its type).

There are logical operator filter types (and, or with multiple children, not with only one child), and more "functional" filter types:

Attributes-based filter types (range, "is in list", non null...)
Topological filter types (ego networks, degrees...)

The UI to setup the filters tree would basically look like Gephi's. Maybe we can just display the tree, and display the list of available filter types in a modal only when the user clicks on "Edit" or "Add", to decrease the amount of displayed information.

2. Flat model

Filters are represented in a flat list, and only nodes that match each filter of the list will be considered in. Then, we cherry-pick the "most user-friendly" filter types (I'd go for attribute based filter types only, and a few specific filters such as "Main connected component" or "ego-network").

The UI would be simpler, with a "Add a filter" button, and the list of already parametered filters with "Delete" buttons, basically. The "Add a filter" (or "Edit filter") would open a modal with the filter type specific form.

jacomyal commented 2 years ago

@jacomyma Do you have some opinion on that specific topic?

Yomguithereal commented 2 years ago

I would vouch to avoid something like a UI to compose complex or chained filters, at least at the beginning. One thing I wonder also is how this interacts with UI features such as filterable legends, click/focus on some nodes/edges etc.

jacomyma commented 2 years ago

Gephi's filter system is, for sure, too complex for Gephi Lite. I see two distinct directions.

Direction 1: filter stack

This is what you propose. You can compose filters but just in a stack, not a tree. Each filter filters down whatever remains from the previous filter.

Direction 2: independent filters

A curated set of filters. Each filter can be activated or not. They could all be activated at the same time. They apply with a general intersection rule: only the nodes allowed by all filters are displayed. So there is no order to the filters.

I hope my description is clear enough, if not I can provide a picture or something, but examples follow.

My opinion on both directions. I think I prefer the second direction because it is even more simple. But let make the stakes visible: it is about how the UI of a filter reacts to the other filters. This is where I see potential friction. The key question is: should a filter offer the options of the full, unfiltered data, or should it offer only the options of the network as already filtered by other filters? Example: I have a network of people with city and age. I filter out the people aged 30+ and there are no more people living in Amiens. Should I keep the option "Amiens" displayed in the "city" filter, or not? On the one hand, it is useful information to see what remains. But on the other hand, it makes interactions complicated. Should the UI remember the state of options that become invisible? etc. To sum up: direction 1 works better by cascading filter information, and direction 2 works better by not doing it. For the user, the simpler is to not cascade the filter information. So direction 2 seems better to me.

jacomyal commented 2 years ago

But let make the stakes visible: it is about how the UI of a filter reacts to the other filters.

We try to address this issue in Retina by showing within the filters both the global and filtered repartitions of the values.

In this capture, the dark graphic elements represent the amount of filtered nodes, while the grey graphic elements (including the dark parts) represent the total amounts. What do you think about something like this @jacomyma?

Another question is about the available filters themselves. As a Gephi user, I mostly want to filter on:

The node attributes that come from my dataset
Some additional node data that is computed within Gephi (degree, communities, PageRank...)

I can imagine two workflows:

The available filters are only the node data. If a user wants to filter on topological data, they first needs to create this data in the statistics / metrics panel (which is exactly the place where we can "create data" in Gephi Lite in that case), and then it becomes available as a filter.
The available filters are more similar to what's in Gephi today, ie. the list of node attributes as well as other more specific filters (degree, main connected component, ego-networks...)

I like solution 1, because it looks more simple to me. Also, it matches well design guideline 2 where Gephi does not. For instance, as a user, I actually often wonder in Gephi: Should I compute degrees and then filter on them, or should I use the dedicated degree filter?

Also, for things that are less obvious to manage with node attributes (for instance ego-networks filtering), there still are some solutions. For instance, in the metrics panel, we could add a Distance to node metric, that creates for each node an attribute that represent its distance to the target node. Then, the user could filter on every nodes with a distance less than 3.

Any idea on that @paulgirard @jacomyma @Yomguithereal?

paulgirard commented 2 years ago

So we have three issues in this thread:

filters combination: stacked vs list
boolean operator between filters: only AND (proposed by Mathieu)
filter generic/specific definition: should filter all work on node/edges attributes

Here are my takes on those

filters combination: list

Stack is complex. Following design guideline 1 I also think list is better and enough.

boolean operator: switch between global AND/OR

AND by default works. But users quickly stumble upon cases where a OR is neat to have. I would not vote for complex boolean queries but a switch between AND/OR for the whole list might be easy to add and understand.

attributes as filter params: yes but

I like the systematic solution of having all filters working on attributes. But it makes a strong constraints which needs some thoughts. The first one is the dependency between a metric filter and the metric calculation. The naive solution is that filter on non existing attributes just don't exist. The problem is that user might think the feature like filtering on pagerank does not exist until one day they execute the metric which reveal the filter feature. One way to take this would be to consider metrics/stats (#8) as data attributes creation tools. This data attribute creation tools could be available (as a modal ?) under a + calculate attributes in the data attribute selection box in the filter interface but also in the appearance one (#5). Thus users are informed that there is more possible ways to filter or color by adding new attributes through calculation.

jacomyma commented 2 years ago

About filters combined as stacked vs. flat

In this capture, the dark graphic elements represent the amount of filtered nodes, while the grey graphic elements (including the dark parts) represent the total amounts. What do you think about something like this @jacomyma?

Yes. Flat filters means that there are only two states: the non-filtered and the filtered data. Each filter may know each one for its visual representation. For contrast, the more complicated alternative (the stack) has one output state per filter (plus the non-filtered state) and only one state is final. It is also more powerful. In Gephi we actually render those states in the Graphstore engine.

The two paradigms differ on a crucial point: when multiple filters are engaged, and a filter takes in to account the network structure (ex. the degree), what does it look at, the unfiltered network or the network as filtered by other filters? In the case of the stack (in Gephi) the filter takes into account the network as filtered by the previous filters in the stack. Each filter outputs a filtered network as an input to the next filter and so on. While in the case of a flat structure, only the unfiltered network can be taken into account.

This only matters insofar as filters work on network structure, not just attributes. It has to, but I will address that just next.

Example of a problematic use case with a flat structure: As a user, I filter down to a single community. My network now looks like a cluster plus a bunch of disconnected nodes (orphans). I now want to get rid of those orphans. I add a degree-range filter to filter out the 0-degree nodes. But as in the unfiltered network, those "orphans" are in fact connected to other nodes (of different communities), their degree is not really 0 and that filter does not filter them out.

I hope that this example shows that the limitations of the flat structure are reached pretty quickly. There are two turnarounds, depending on the situation. Saving the filtered state as a new attribute (including computing stats like the degree); and saving the filtered network as a new network and re-open it in another tab (browser tab).

I see two important cons to the flat structure:

It is pretty limited (it misses quite simple use cases)
It is different from Gephi (it breaks consistency)

The key pro is still simplicity, and the fact that it makes no difference when you use a single filter, which is the most common scenario. But I grew a bit less convinced that flat is the right call.

boolean operator: switch between global AND/OR

Agreed, not much more complicated and more powerful.

About filtering only on attributes (I disagree)

I like solution 1 [The available filters are only the node data], because it looks more simple to me. Also, it matches well https://github.com/gephi/gephi-lite/issues/13 where Gephi does not.

I think filtering only on attributes is not an option. If only for one reason, because the most useful filters are those not based on attributes, the filters in the "topology" section in Gephi, like degree range, k-core, ego network, giant component... Those are used all the time. It also includes the "inter" and "intra" filters in the "attributes" section. Not that we have to implement all of them in Gephi Lite, but those are staples of the Gephi workflow.

There is another reason worth discussing because it touches on something else: the degree.

as a user, I actually often wonder in Gephi: Should I compute degrees and then filter on them, or should I use the dedicated degree filter?

Yes it is somewhat confusing, in Gephi, that you can compute the degree even though it is always there at the same time. But the alternative is in my eyes much worse: relying on an obsolete computation of the degree. My argument stems from the fact that whether the users sees it or not, nodes always have a degree. Degree is derivative by definition. The situation we should aim to avoid at all cost a user filtering on the "degree" in good faith but that "degree" is not the actual degree. This is, in essence, why the degree is never a column in the nodes table.

Filtering by degree is very common. If we force users to compute the degree, they will create that dangerous column. Deleting a node suffices to make it obsolete. The column will keep being there as the user keeps working on the network. If they want to filter again later by degree, they will have to create other "degree" columns. I think that this would be even more confusing.

In short, I think the degree should stay dynamic because in essence, that is what it is and what we expect it to be. I acknowledge that it does not make our life easier, though.

Working on columns as a filtering practice

I still retain something of solution 1. Working with node attributes is a solid way of turning around the filters. I think we might explore this practice further, and possibly incentivize it.

Here, I mean that we could work on columns in the nodes table exactly like with filters. A node filter outcome is just like a Boolean column (i.e. node attribute). We could combine Boolean columns with AND, OR, XOR into a new results column. Filtering on the result column would be like combining the filters of the source columns. We could think of many filters as tools that read and write columns (although the filters that read the structure itself are different).

I am not proposing that we actually do that, this is just to highlight the correspondence. But we could make it easy to save the result of a filter as a Boolean column and offer a way to combine columns as a way to achieve sophisticated filter combinations that we do not want to offer in filters directly. Reflecting on the nodes table might be a good way to keep the scope of filtering in check.

jacomyal commented 2 years ago

After multiple discussions, we ended up agreeing on a stack of filters. To make information more readable, the ticket describing the needs on this topic is not #15.

gephi / gephi-lite

Filters block #9

1. Gephi's model

2. Flat model

Direction 1: filter stack

Direction 2: independent filters

filters combination: list

boolean operator: switch between global AND/OR

attributes as filter params: yes but

About filters combined as stacked vs. flat

About filtering only on attributes (I disagree)

Working on columns as a filtering practice