Closed willow-ahrens closed 1 year ago
Could you make up a tiny example notebook with a basic CSR (and maybe one additional format) tree that shows these functions in action? It's hard to concretize some of them in my head.
Specifically env
is still not very clear to me.
There is kindof a weird dynamic going on with multi-index levels. The interface between a level and its child is clear, but doesn't (and shouldn't) apply to intermediate results of unfurling a multi-index level (i.e. popping one index of a multi-index level). I think it makes sense to at least document which properties are expected to be defined in environments, but we might want to move to a trait-based interface where each level defines its own Fiber.
I want to make a notebook, but I'm not sure I'll have time until after the camera ready deadline. This is the sort of interface that is difficult to make better until there are many examples of what it needs to do.
What are the existing multi-index levels? Examples of where they appear
Wait I don't understand, how is COO multi-index? Isn't it singleton - singleton - ...?
If we encode singletons by themselves, we need to come up with a much more complicated level interface that handles non-uniqueness/non-sortedness. Encoding singletons by themselves also doesn't map well to Array-Of-Structs COO. I found it easier to manage the indices all in the same level.
One possible approach here is to introduce a struct that allows us to just emit looplets for more than one index at a time (i.e. just unfurl the whole tensor at once, why not?). This would eliminate the need for complicated environments that reflect internal state that doesn't actually need a formal interface.
We would just need to leave a little wrapper struct to mark which places in the looplet nest expect to accept a new index.
This would make it relatively easy to write looplets for existing sparse matrices too, as we could just emit a double-index looplet for the array without introducing new types
A few remaining tasks:
getname
Environment
and add new Fiber
typesget_dirty
and set_clean
After several fixes, these changes have finally been implemented and starting documentation was added with #117. I think more docs would be nice, but at this point this rfc is closed and we can move on.
To summarize, in the end env
was simplified/restricted to just pos
.
Let's clean up the level formats a bit so that they can interoperate with other systems! A level is essentially an allocator for vectors, which allows us to represent e.g. matrices efficiently as vectors of vectors.
If all goes well, the level formats will do mostly the same thing they already do, but the interface will be clear. This is about reorganizing and documenting what already works.
1. A set of functions that defines a level at runtime
Currently, the
Fiber
type is declared asIf a level
lvl
is an allocator for vectors,fiber(lvl, env)
is one such vector, whereenv
holds enough information to differentiate the two.We can then define AbstractArray methods on the fiber as, e.g.
Base.ndims(fbr::Fiber{<:SparseBytemapLevel}) = 1 + ndims(Fiber(fbr.lvl.lvl, Environment(fbr.env)))
We can access an element of a fiber as
fbr(idx)
.The problem is that it's unclear what information needs to be put in the environment to make a fiber out of a level. The environment is currently just a dict that we put information into. We can put whatever info we want into the environment, but most of the time, the most important variable is the
position
. If the level were a vector of fibers, the position is an integer specifying which fiber to use.I think we should formalize a level as a vector of fibers. Each level should return its own fiber object. The index with which we access a level becomes it's position, and we will see that several other functions have good analogies here.
2. A set of functions that defines a level at finch compile time
getname(tns)
gets the name of the tensor.getsites(tns)
gets the names of the modes of the tensor. This is useful for dimensionalization.getsize(tns, ctx, mode)
gets the sizes of the modes of the tensor.setsize(tns, ctx, mode)
sets the sizes of the modes of the tensor. Notably,setsize
can currently get called on a tensor that is being read if a tensor with the same name has previously been written to and the new sizes need to be propagated. Not sure if we like that.getsize
andsetsize
might be generalized a bit to include other uses ofdeclare_dimensions_access
andinfer_dimensions_access
, so that index modifiers like permissive indices and protocols can avoid directly overloading those.default(tns)
is the value a tensor is initialized to, and often is the "zero" value.initialize_level!
currently emits code to set a tensor to its default and get it ready for iterating. This is likezero!(lvl)
or perhapsempty!(lvl)
finalize_level!
currently emits code to clean up a tensor after writing to it. This is when one callsqsort
on a list of coordinates.initialize
sometimes needs to be called on subranges of the level, so we should add specializations.assemble!
currently emits code to allocate a new fiber in a level. This is like aresize!
on the level vector.unfurl(fbr, ctx, mode, idx, idxs...)
emits looplets to iterate over a fiber. Often,idx
is inspected to determine which protocol to use. At the end,exfurl
is called recursively to pass the looplets through index modifiers. I want to tweak this interface to pass along an extent during unfurling. I also want to remove the responsibility for the tensor unfurl to handle the tail (irrelevant) indices or call itself recursively.refurl(tns, ctx, mode, idxs...)
gets called on a fiber just after it is created. It can be helpful to emit code here if you want to do something eagerly.stylize_access(node, ctx, tns)
This gets called on tensors to ask whether they want to participate in Looplet stuff. I think this should be a trait instead.TL;DR: Levels are vectors of fibers, make levels define their own Fiber types, use traits to avoid boilerplate.