Indexing of data attached to C-sets

olynch commented 4 years ago

In Petri.jl, there is a very convenient interface for creating a Petri net using a vector of symbols for the species and an LVector of LVectors for the transitions: https://github.com/mehalter/Petri.jl/blob/master/docs/src/usage.md. Currently, if I were to make a CSet-based whole-grained Petri net, it would be incredibly annoying to, for instance, input a SIR model just by manually using add_parts! and set_subparts!.

There a couple potential improvements that we could make for this. The first would be to change the add_parts! interface so that each part added could have different associated subparts; currently if you add multiple parts through add_parts!, it seems to me that they all have the same associated subparts.

Another improvement would be to have something that added multiple different types of parts, using an LVector.

A third improvement would be to have something that automatically associated symbols with elements of the CSet. That is, we could add an :S to the set of species for a Petri net, and then refer to it by :S in morphisms.

Finally, is there a mathematical theory of sparse vs. dense representations? For instance, you can represent the edges in a graph via a multiset of pairs, or by an adjacency matrix. For Petri net, you can represent the sources and targets of transitions by tuples of real numbers (classically), or have a list of attachments between transitions and species (in the whole-grained presentation). Sometimes it might be easier to input CSets through their dense representation, sometimes it might be easier to input in the sparse representation.

epatters commented 4 years ago

Hi Owen, great that you're working on the whole grain Petri nets.

Currently, if I were to make a CSet-based whole-grained Petri net, it would be incredibly annoying to, for instance, input a SIR model just by manually using add_parts! and set_subparts!.

While I'm certainly open to ergonomic improvements, I think the general paradigm is that for specific CSet-based data structures we'll want to wrap the generic CSet interface with new Julia functions having meaningful names. The new Graphs (multidigraphs) module is an example, and I'm taking the same approach for undirected wiring diagrams, which I'm working on at this branch. So I would expect that in the case of whole grain Petri nets, you would provide an interface similar to the one that @mehalter is already using. Micah, please let us know if you have any thoughts.

The alternative to this paradigm is to generate, for every theory C, a whole API worth of Julia functions using metaprogramming, but that's a can of worms I'd prefer not to open right now, in part because I'm not convinced there's a generic way to generate the "right" API. For example, the alignment of symmetric graphs onto an API that is more "undirected" than "symmetric" requires a little thinking; it is not completely mechanical.

There a couple potential improvements that we could make for this. The first would be to change the add_parts! interface so that each part added could have different associated subparts; currently if you add multiple parts through add_parts!, it seems to me that they all have the same associated subparts.

Another improvement would be to have something that added multiple different types of parts, using an LVector.

I agree, these two could be good improvements.

A third improvement would be to have something that automatically associated symbols with elements of the CSet. That is, we could add an :S to the set of species for a Petri net, and then refer to it by :S in morphisms.

This one is trickier, as it is basically the distinction between FinOrd and FinSet. Currently, we are following LightGraphs in taking the FinOrd approach: having all parts of given type be numbered consecutively from 1 to n. The two approaches have various pros and cons.

IMO, we should retain the FinOrd approach but allow optional indexing by name. Currently, it is possible to associate names to elements using data morphisms of type Symbol. However, as noted in the original PR, indexing (reverse lookup) is not yet supported for data morphisms. So it would be good to remedy this by allowing indexing, both unique and non-unique, for data morphisms. Analogous features are provided by relational databases.

Finally, is there a mathematical theory of sparse vs. dense representations? For instance, you can represent the edges in a graph via a multiset of pairs, or by an adjacency matrix. For Petri net, you can represent the sources and targets of transitions by tuples of real numbers (classically), or have a list of attachments between transitions and species (in the whole-grained presentation). Sometimes it might be easier to input CSets through their dense representation, sometimes it might be easier to input in the sparse representation.

There is not a general theory, AFAIK, but it's fun to think about and shouldn't be too hard to work out. The representation I implemented for C-sets is sparse: it is based on adjacency lists, not adjacency matrices. You're welcome to think about a possible matrix representation. I created the AbstractCSet type because the design space is large and there are certainly other useful representations, e.g., relational databases for data too large to fit into memory.

jpfairbanks commented 4 years ago

I agree that working in FinOrd makes everything easier to program, but that it does hamper usability. We should probably deal with named objects by storing the explicit list of names as a map in the indexing category, for example Graphs with vertex names could be modeled as

@present TheoryGraph(FreeCategory) begin
  V::Ob
  E::Ob
  Name::Ob
  src::Hom(E,V)
  tgt::Hom(E,V)
  vaname::Hom(V, Name)
end

which of course should use the theory inheritance,

@present NamedGraph <: TheoryGraph begin
  Name::Ob
  vname::Hom(V,Name)
end

this supports the option of having edge names too,

@present ENamedGraph <: NamedGraph begin
  Name::Ob
  ename::Hom(E,Name)
end

it harmonizes vertex names with edge weights as just functions from the vertex/edge set into the name/weight set as opposed to managing the indexing by name everywhere.

epatters commented 4 years ago

Agreed, this is the right approach. And we're halfway there: we just need to implement indexing for the names so that lookups will be fast. I'll put that on my list.

epatters commented 4 years ago

There a couple potential improvements that we could make for this. The first would be to change the add_parts! interface so that each part added could have different associated subparts; currently if you add multiple parts through add_parts!, it seems to me that they all have the same associated subparts.

@olynch, I should have said above that this is already supported because the interface is vectorized. For an example of how this works, see the add_edges! function for symmetric graphs.

epatters commented 4 years ago

I've renamed the issue to reflect the primary feature request that has emerged from the discussion.

To summarize, we should:

Support indexing (reverse lookup) of data attached to C-sets. Using a dictionary would be the obvious way to accomplish that.
Both unique and non-unique indexing should be supported. In the former case, uniqueness should be enforced.

An important use case is assigning unique names to specific elements, so that elements don't have to be referenced by their number.

olynch commented 4 years ago

I think you all have done a great job of figuring out what I meant to say, while I was off doing less important things :P. Should I start thinking about how this indexing might work, or does @epatters already have an idea in mind, and I'll do other things related to whole-grained petri nets?

epatters commented 4 years ago

Thanks Owen, I think it makes sense for me to do it since I have a pretty good idea what needs to be done. I can probably do it this week. If you want to keep working on other aspects of whole grain Petri nets, that would be great.

olynch commented 4 years ago

This would also be great for coarse-grained petri nets, because if you want to look up the inputs and outputs of a transition, then you need to do a reverse indexing.

epatters commented 4 years ago

Good news, that should be possible already; here is how it's done for graphs. The only thing that's missing right now is indexing for data attributes, like symbolic names or numerical weights.

olynch commented 4 years ago

Ohh, that's what the index is for. OK, I'm still learning how to use CSets.

epatters commented 4 years ago

Sure, no worries.

AlgebraicJulia / Catlab.jl

Indexing of data attached to C-sets #190