Closed gskorokhod closed 2 months ago
Before we can properly release uniplate as a separate crate, we should probably implement it for structs and unions
This definitely would be good to have eventually, but I think enums would be the most common use-case by far (this is how they are used in haskell, and AST style things are generally represented in enums), so I don't think we need to prioritise this.
I think for a 1.0 biplates and zippers would be more important?
I don't think I understand how they would work with structs - my understanding is that generic programming constructs like these solve problems to do with sum types (enums), but structs don't have variants?
What if you had a struct nested in one of the variants? So for completeness they need to be supported, I think.
What if you had a struct nested in one of the variants? So for completeness they need to be supported, I think.
So its for cases where we have an enum T containing a struct that itself contains an enum T?
I think in the paper / Haskell implementation "transient nesting" isn't supported by the base Uniplate derivation? I think later on he discusses making a Uniplate<T>
be defined as Biplate<T,T>
, which would detect transient children, but Biplates come with other limitations iirc, and we will need biplates first to do it this way...
Don't know how the derivation will work, but you can see how this can be manually implemented, right?
I thought Biplate<T,T>
would be identical to Uniplate<T>
but I may be missing something subtle. I definitely never knowingly using Biplate<T,T>
in the past.
Don't know how the derivation will work, but you can see how this can be manually implemented, right?
For a Biplate you derive O(n^2) instances, where n is the number of types in the structure. Each Biplate<T,U>
tells you how to find a type T from type U. You then walk towards the right type recursively with different biplates.
For example, the path for a type T that contains a V that contains a U that contains a T you would:
Call universeBi on Biplate<T,T> Call universeBi on the child V using Biplate<T,V> Call universeBi on the child U using Biplate<T,U> This is a base case and would return the inner T.
Although the paper also seems to suggest that Biplate<T,T> just does nothing and returns itself as its children so I am not 100% sure about the above.
But if everything was a T that should boil down to the same thing?
The situation I am thinking of is: if a variant of an enum T contains a struct S that contains a T. In that case should the T be gives as a child of the enum or not. Not sure what Haskell does in this case though I can check!
The situation I am thinking of is: if a variant of an enum T contains a struct S that contains a T. In that case should the T be gives as a child of the enum or not. Not sure what Haskell does in this case though I can check!
I might come and see you in a minute, this is something that I probably need to dig into to understand biplates as well! Also because we have slightly changed the semantics for list convenience...
Something like
data T = A Int | B T T | S D
data D = D T T
test1 = B (A 1) (A 2) -- should return 2 children of type T
test2 = S (D (A 1) (A 2) -- 0 or 2 children of type T
The results are in!
t.hs
{-# LANGUAGE DeriveDataTypeable #-}
import Data.Data
import Data.Generics.Uniplate.Data
data T = A Int | B T T | S D
deriving (Show, Eq, Data, Typeable)
data D = D T T
deriving (Show, Eq, Data, Typeable)
test1 = B (A 1) (A 2) -- should return 2 children of type T
test2 = S (D (A 1) (A 2)) -- 0 or 2 children of type T
$ ghci t.hs
[1 of 1] Compiling Main ( t.hs, interpreted )
Ok, one module loaded.
ghci> children test1
[A 1,A 2]
ghci> children test2
[A 1,A 2]
ghci>
The plot thickens (biplate now)
ghci> children test2 :: [T]
[A 1,A 2]
Ok, as expected
ghci> childrenBi test2 :: [T]
[S (D (A 1) (A 2))]
And turns out Biplate<T,T> and Uniplate
ghci> childrenBi (D (A 1) (A 2)) :: [T]
[A 1,A 2]
This is also as expected.
We (@gskorokhod, @ozgurakgun and I) should probably discuss this more on Friday- it seems that we need some sort of Biplates to implement Uniplates properly.
An important mistake I made is that children is maximal substructures of the same type, not direct children - this explains the examples above.
Oz and I have just met to discuss this - the basic idea we came up with is to do something like this (in psuedo-Haskell):
data T = A Int | B T T | S D
deriving (Show, Eq, Data, Typeable)
data D = D T T
deriving (Show, Eq, Data, Typeable)
instance Uniplate of T where
--- ignoring context part of uniplate for brevity
children A a = [] -- a is not an algebraic data-type, so no need to check it
children B a b = (childrenBi a ::[T]) ++ (childrenBi b ::[T])
children S d = (childrenBi d ::[T])
-- ...
childrenBi a ::[T]
means find all T
typed children of a using a Biplate.
This makes the weird case with Biplate<T,T>
make sense - if a is not a T, it will find all nested T's using the biplate. If a is a T, it will just return [a]
and not descend deeper.
This is just intuition - there is lots we can do to push this more onto build-time and optimise, etc etc.
Section 5 of the paper describes the derivation of Uniplate and Biplate in terms of a common helper class. This uses a set of combinators that mark whether a field is T, may contain T, or will never contain T. This is analogous to the intuition shown above, and may be helpful to think about.
(Oz had a good idea for code-gen but I don't think I understand it deeply enough to explain it!)
@gskorokhod this also gives us collections and deeper nesting and so on as you've discussed before which you would like :)
This is the combinators version of Uniplate I discussed above : https://hackage.haskell.org/package/uniplate-1.6.13/docs/Data-Generics-Uniplate-Direct.html
The implementation of Uniplate/Biplate based on these is here: https://hackage.haskell.org/package/uniplate-1.6.13/docs/src/Data.Generics.Uniplate.Direct.html#%7C%2B
I think thinking in terms of children or childrenBi can be misleading. Think in terms or uniplate and biplate (as we had on the whiteboard)
You could call biplate on primitive types if you wanted to. I trust the rust compiler to optimise those away :)
We want:
uniplate :: (a -> Tree<a>, Tree<a> -> a)
biplate :: (a -> Tree<b>, Tree<b> -> a)
For a suitable Tree type. Tree a = Empty | Leaf a | Branch(Vec<a>)
will probably do. You don't need Empty, strictly speaking, but probably nice to have.
BTW I am loving this pseudo Haskell/Rust hybrid. Sorry none of the pseudocode we write in this thread is valid Haskell nor Rust!
A picture of the board for future reference:
This seems important to how Trees (called Str in the Haskell) work!
(Data/Generics/Str.hs)
Tree functions we will want will be that and maybe some sort of map.
Someone at Stack Overflow did some of uniplate in C# too: https://github.com/benjamin-hodgson/Sawmill
I thought we had decided that we wouldn't need Str exactly as it is implemented in Haskell, instead we would have a Tree that can have vectors as children. Is there a problem with that idea?
I thought we had decided that we wouldn't need Str exactly as it is implemented in Haskell, instead we would have a Tree that can have vectors as children. Is there a problem with that idea?
I agree, I was just wondering how we turn our trees into lists for the user facing API functions.
I still think the strStructure function is still useful and the implementation of it is almost the same with vectors instead of Two.
I am currently doing a writeup on how we use Trees, what functions they need and how they lead to biplate and uniplate implementations. I will post this here when done.
RFC @gskorokhod @ozgurakgun (and anyone else!)
I have wrote a page on the wiki detailing how uniplates and biplates can be implemented using a tree data structure.
Currently, the derive macro only supports enums (see #259). Before we can properly release uniplate as a separate crate, we should probably implement it for structs and unions