biolab / orange3-network

🍊 🕸 Network analysis add-on for Orange data mining suite.
Other
39 stars 33 forks source link

Single mode bipartiteness check #129

Closed thocevar closed 3 years ago

thocevar commented 5 years ago

The Single mode widget crashes on non-bipartite networks.

It should check and offer only attributes that result in bipartite networks.

matejklemen commented 5 years ago

Hey, I have a question regarding the second part of this issue (only offering attributes that result in bipartite networks).

Wouldn't performing these checks require running projections for each attribute and each combination of values of selected attribute (maybe not all pairs, but just that many until one bipartite result is obtained for an attribute)? If so, wouldn't this get very slow for larger networks?

ajdapretnar commented 5 years ago

It seems like it is possible to check for bipartiteness in linear time:

It is possible to test whether a graph is bipartite, and to return either a two-coloring (if it is bipartite) or an odd cycle (if it is not) in linear time, using depth-first search.

https://en.wikipedia.org/wiki/Bipartite_graph

matejklemen commented 5 years ago

But still, the way I see it, the logic would look something like:

for each attribute:
   for each value in <'Connect:' field>:
       for each value in <'By:' field>:
           if linear_bipartiteness_check(...):
               # attribute is valid, can stop checking this one
               # ...

Maybe I'm missing something, but this seems very slow :sweat_smile:

Also, just double checking: the check should be whether the source network (with respect to selected values of Connect: and By:) is bipartite, not the target one, right? Since this widget converts networks to a single mode, which is kinda the "opposite" of bipartite

ajdapretnar commented 5 years ago

the check should be whether the source network is bipartite, not the target one, right?

Yes, of course. :)

janezd commented 3 years ago

I think it's simpler.

  1. We can limit the feature to be binary (at least: have just two distinct values on this particular data), because this is conversion from bimodal network, not from multimodal. Then all that remains to be checked is that all edges connect two instances with different values of that feature.

  2. Even if we allow features with multiple values, we can go through all features. For each features, go through all edges and observe the value of the feature. Construct a set of this values. Valid values are those that never connect to themselves. Valid features are features with at least one valid values.

  3. If 2 is too slow, we can show all features with at least two values. When the user selects a feature, we show only valid values (or "").

  4. In the worst case, we assume the user know what (s)he's doing. Offer all values and just add the check in the code. If value is invalid, show an error.

janezd commented 3 years ago

@thocevar, can you construct a network on which it crashes? I tried and failed.

thocevar commented 3 years ago

I think this widget is widely misunderstood (including myself). At first I was under the impression that it simply extracts a bipartite network by defining one set of nodes ("Connect:") and the other ("by:"). Documentation needs updating once this widget is fixed. Currently it say "We connect persons (nodes) with the events they attended (edges)" ... a better choice would be "by" instead of "with".

In short, the widget takes a bipartite network and outputs a new network consisting of only the first set of nodes, which are connected if they share a common node from the second set.

I agree with the post above. The feature and its value define the set of nodes of the resulting network. As long as there's no connection between them, it's ok. And the edges between them are then defined by the existence of common nodes of some specified type (these can even have connections between themselves).

I would limit the case to binary features and point 1. in the previous post to keep it simple. Otherwise, I don't see a reason to not be even more general: the first set of nodes can be defined by some feature-value pair and the second set by an entirely different feature (and obviously value) as long as they induce a bipartite network. To keep it simple, a user can produce such binary feature via Feature Constructor if necessary.

janezd commented 3 years ago

I can imagine a use for point 2. We may have vertices representing people, and vertices representing two different types of events. I wouldn't expect the more general case with two variables, though; there'll usully be one variables that will indicate the role of the vertex.

Having two vertices (like persons) connected initially, is strange, but if we assume one would have such a graph: how does it crash the widget? I just tried with airtraffic, like connecting all hubs that serve the same non-hub and similar, and it works.

thocevar commented 3 years ago

It doesn't crash anymore due to #143. I guess what is left to decide is how we want to filter the attributes (if we want to filter them further at all).

janezd commented 3 years ago

Improve documentation.