Children should be maximal substructures of the same type, and deriving Uniplates with Biplates

gskorokhod commented 2 months ago

Currently, the derive macro only supports enums (see #259). Before we can properly release uniplate as a separate crate, we should probably implement it for structs and unions

niklasdewally commented 2 months ago

Before we can properly release uniplate as a separate crate, we should probably implement it for structs and unions

This definitely would be good to have eventually, but I think enums would be the most common use-case by far (this is how they are used in haskell, and AST style things are generally represented in enums), so I don't think we need to prioritise this.

I think for a 1.0 biplates and zippers would be more important?

niklasdewally commented 2 months ago

I don't think I understand how they would work with structs - my understanding is that generic programming constructs like these solve problems to do with sum types (enums), but structs don't have variants?

ozgurakgun commented 2 months ago

What if you had a struct nested in one of the variants? So for completeness they need to be supported, I think.

niklasdewally commented 2 months ago

What if you had a struct nested in one of the variants? So for completeness they need to be supported, I think.

So its for cases where we have an enum T containing a struct that itself contains an enum T?

I think in the paper / Haskell implementation "transient nesting" isn't supported by the base Uniplate derivation? I think later on he discusses making a Uniplate<T> be defined as Biplate<T,T>, which would detect transient children, but Biplates come with other limitations iirc, and we will need biplates first to do it this way...

ozgurakgun commented 2 months ago

Don't know how the derivation will work, but you can see how this can be manually implemented, right?

I thought Biplate<T,T> would be identical to Uniplate<T> but I may be missing something subtle. I definitely never knowingly using Biplate<T,T> in the past.

niklasdewally commented 2 months ago

Don't know how the derivation will work, but you can see how this can be manually implemented, right?

For a Biplate you derive O(n^2) instances, where n is the number of types in the structure. Each Biplate<T,U> tells you how to find a type T from type U. You then walk towards the right type recursively with different biplates.

For example, the path for a type T that contains a V that contains a U that contains a T you would:

Call universeBi on Biplate<T,T> Call universeBi on the child V using Biplate<T,V> Call universeBi on the child U using Biplate<T,U> This is a base case and would return the inner T.

niklasdewally commented 2 months ago

Although the paper also seems to suggest that Biplate<T,T> just does nothing and returns itself as its children so I am not 100% sure about the above.

ozgurakgun commented 2 months ago

But if everything was a T that should boil down to the same thing?

The situation I am thinking of is: if a variant of an enum T contains a struct S that contains a T. In that case should the T be gives as a child of the enum or not. Not sure what Haskell does in this case though I can check!

niklasdewally commented 2 months ago

The situation I am thinking of is: if a variant of an enum T contains a struct S that contains a T. In that case should the T be gives as a child of the enum or not. Not sure what Haskell does in this case though I can check!

I might come and see you in a minute, this is something that I probably need to dig into to understand biplates as well! Also because we have slightly changed the semantics for list convenience...

ozgurakgun commented 2 months ago

Something like

data T = A Int | B T T | S D

data D = D T T

test1 = B (A 1) (A 2) -- should return 2 children of type T
test2 = S (D (A 1) (A 2) -- 0 or 2 children of type T

ozgurakgun commented 2 months ago

The results are in!

t.hs

{-# LANGUAGE DeriveDataTypeable #-}

import Data.Data
import Data.Generics.Uniplate.Data

data T = A Int | B T T | S D
          deriving (Show, Eq, Data, Typeable)

data D = D T T
          deriving (Show, Eq, Data, Typeable)

test1 = B (A 1) (A 2) -- should return 2 children of type T
test2 = S (D (A 1) (A 2)) -- 0 or 2 children of type T

$ ghci t.hs
[1 of 1] Compiling Main             ( t.hs, interpreted )
Ok, one module loaded.
ghci> children test1
[A 1,A 2]
ghci> children test2
[A 1,A 2]
ghci>

ozgurakgun commented 2 months ago

The plot thickens (biplate now)

ghci> children test2 :: [T]
[A 1,A 2]

Ok, as expected

ghci> childrenBi test2 :: [T]
[S (D (A 1) (A 2))]

And turns out Biplate<T,T> and Uniplate are not identical.

ghci> childrenBi (D (A 1) (A 2)) :: [T]
[A 1,A 2]

This is also as expected.

niklasdewally commented 2 months ago

We (@gskorokhod, @ozgurakgun and I) should probably discuss this more on Friday- it seems that we need some sort of Biplates to implement Uniplates properly.

An important mistake I made is that children is maximal substructures of the same type, not direct children - this explains the examples above.

Oz and I have just met to discuss this - the basic idea we came up with is to do something like this (in psuedo-Haskell):

data T = A Int | B T T | S D
          deriving (Show, Eq, Data, Typeable)

data D = D T T
          deriving (Show, Eq, Data, Typeable)

instance Uniplate of T where
  --- ignoring context part of uniplate for brevity
  children A a = [] -- a is not an algebraic data-type, so no need to check it
  children B a b =  (childrenBi a ::[T]) ++ (childrenBi b ::[T])
  children S d =  (childrenBi d ::[T])
-- ...

childrenBi a ::[T] means find all T typed children of a using a Biplate.

This makes the weird case with Biplate<T,T> make sense - if a is not a T, it will find all nested T's using the biplate. If a is a T, it will just return [a] and not descend deeper.

This is just intuition - there is lots we can do to push this more onto build-time and optimise, etc etc.

Section 5 of the paper describes the derivation of Uniplate and Biplate in terms of a common helper class. This uses a set of combinators that mark whether a field is T, may contain T, or will never contain T. This is analogous to the intuition shown above, and may be helpful to think about.

(Oz had a good idea for code-gen but I don't think I understand it deeply enough to explain it!)

niklasdewally commented 2 months ago

@gskorokhod this also gives us collections and deeper nesting and so on as you've discussed before which you would like :)

This is the combinators version of Uniplate I discussed above : https://hackage.haskell.org/package/uniplate-1.6.13/docs/Data-Generics-Uniplate-Direct.html

The implementation of Uniplate/Biplate based on these is here: https://hackage.haskell.org/package/uniplate-1.6.13/docs/src/Data.Generics.Uniplate.Direct.html#%7C%2B

ozgurakgun commented 2 months ago

I think thinking in terms of children or childrenBi can be misleading. Think in terms or uniplate and biplate (as we had on the whiteboard)

You could call biplate on primitive types if you wanted to. I trust the rust compiler to optimise those away :)

We want:

uniplate :: (a -> Tree<a>, Tree<a> -> a)

biplate :: (a -> Tree<b>, Tree<b> -> a)

For a suitable Tree type. Tree a = Empty | Leaf a | Branch(Vec<a>) will probably do. You don't need Empty, strictly speaking, but probably nice to have.

BTW I am loving this pseudo Haskell/Rust hybrid. Sorry none of the pseudocode we write in this thread is valid Haskell nor Rust!

niklasdewally commented 2 months ago

A picture of the board for future reference:

niklasdewally commented 2 months ago

This seems important to how Trees (called Str in the Haskell) work!

(Data/Generics/Str.hs) Screenshot 2024-03-20 at 21 04 45

niklasdewally commented 2 months ago

Tree functions we will want will be that and maybe some sort of map.

niklasdewally commented 2 months ago

Someone at Stack Overflow did some of uniplate in C# too: https://github.com/benjamin-hodgson/Sawmill

ozgurakgun commented 2 months ago

I thought we had decided that we wouldn't need Str exactly as it is implemented in Haskell, instead we would have a Tree that can have vectors as children. Is there a problem with that idea?

niklasdewally commented 2 months ago

I thought we had decided that we wouldn't need Str exactly as it is implemented in Haskell, instead we would have a Tree that can have vectors as children. Is there a problem with that idea?

I agree, I was just wondering how we turn our trees into lists for the user facing API functions.

I still think the strStructure function is still useful and the implementation of it is almost the same with vectors instead of Two.

niklasdewally commented 2 months ago

I am currently doing a writeup on how we use Trees, what functions they need and how they lead to biplate and uniplate implementations. I will post this here when done.

niklasdewally commented 2 months ago

RFC @gskorokhod @ozgurakgun (and anyone else!)

I have wrote a page on the wiki detailing how uniplates and biplates can be implemented using a tree data structure.

https://github.com/conjure-cp/conjure-oxide/wiki/2024%E2%80%9003-Implementing-Uniplates-and-Biplates-with-Structure-Preserving-Trees

conjure-cp / conjure-oxide

Children should be maximal substructures of the same type, and deriving Uniplates with Biplates #261