expipiplus1 / vulkan

Haskell bindings for Vulkan
https://hackage.haskell.org/package/vulkan
BSD 3-Clause "New" or "Revised" License
141 stars 32 forks source link

Reduce possible overhead in Extends solver #538

Open s-and-witch opened 3 months ago

s-and-witch commented 3 months ago

The definition of Extendss that involves pairs:

  type Extendss p xs where
    Extendss p '[] = ()
    Extendss p (x : xs) = (Extends p x, Extendss p xs)

Would produce a constraint that looks like this: ((), ((), ((), ...))) That may result in a runtime overhead if GHC would be incapable to perform cross-module inlining here (it would be)

The new ReportUnsolved type family ensures that Extends p x reduces to the constraint unit and drops it after that entirely. Hence, produced constraint would be just (). Much better!

To ensure that having Extends p x and Extendss p xs implies Extendss p (x : xs), all extension associations were moved to the ExtendsWith type family, and Extends now becomes just a constraint wrapper:

  type Extends p x = ExtendsWith p x ~ '()

That should save backward compatibility.

Here is a minimal example to show that it works as expected: https://play.haskell.org/saved/3UGbL6ef

The change should not affect any user, this is just an internal optimization.

s-and-witch commented 3 months ago

Ok, I've checked CI error. It happens that previously cons of Extends a e to Extendss a es worked well because we didn't care about exact shape of the constraint. However, it is no longer true because SeqConstraintUnits want to see () :: Constraint here. Hence, this code:

extendSomeStruct
  :: forall a e. (Extensible a, Extends a e)
  => e
  -> SomeStruct a
  -> SomeStruct a
extendSomeStruct e (SomeStruct @_ @es a) = SomeStruct (setNext a (e, getNext a))

Fails with such error:

• Could not deduce (SeqConstraintUnit
                          (Extends a e) (Extendss a es))

To fix this error, we have to add equality constraint directly:

extendSomeStruct
  :: forall a e. (Extensible a, Extends a e ~ (() :: Constraint))
  => e
  ...

I see several ways to deal with it: 1) Just rest it in the current state 2) Add equality constraints to all high-order functions (That may break users, so this is an unwanted option) 3) Move Extends equations to another type family, e.g. ExtendsWith:

type family ExtendsWith (a :: [Type] -> Type) (b :: Type) :: () where
  ExtendsWith AccelerationStructureCreateInfoKHR OpaqueCaptureDescriptorDataCreateInfoEXT = '()
  ExtendsWith AccelerationStructureCreateInfoKHR AccelerationStructureMotionInfoNV = '()
  ExtendsWith AccelerationStructureCreateInfoNV OpaqueCaptureDescriptorDataCreateInfoEXT = '()

And declare Extends as an equality constraint:

type Extends a b = ExtendsWith a b ~ '()

That should be enough to hide the change from users, however, I want to know your opinion before implementing this.

alt-romes commented 3 months ago

@s-and-witch could you measure the compile time performance impact of this change?

alt-romes commented 3 months ago

@s-and-witch could you measure the compile time performance impact of this change?

I think that if the change is relevant, it justifies the added complexity of your second proposed solution.

s-and-witch commented 3 months ago

@s-and-witch could you measure the compile time performance impact of this change?

Sure. I'm using environment from $ nix-shell default.nix, use cabal clean before building and use time cabal clean for measurement. Here are my results:

With my patch it takes

real    6m41,371s
user    6m4,188s
sys     0m34,240s

That's a lot!

In current main, it takes just

real    6m39,330s
user    6m3,066s
sys     0m33,844s
alt-romes commented 3 months ago

Interesting. The change is negligible. Where does that leave us wrt "reducing possible overhead"? Do you think the possible performance benefits of this change would show up in another workload?

s-and-witch commented 3 months ago

Let's talk about statically observable changes using this showcase: https://play.haskell.org/saved/E3ELcj3Z

The code of our interest is this, because it is an actual code taken from the Vulkan.CStruct.Extends module:

data SomeStruct (a :: [Type] -> Type) where
  SomeStruct
    :: forall a es
     . (Extendss a es, PokeChain es, Show (Chain es))
    => a es
    -> SomeStruct a

extendSomeStruct
  :: (Extensible a, Extends a e, ToCStruct e, Show e)
  => e
  -> SomeStruct a
  -> SomeStruct a
extendSomeStruct e (SomeStruct a) = SomeStruct (setNext a (e, getNext a))

With previous logic, extendSomeStruct is compiled into this core:

extendSomeStruct
  = \ @a_aSr @e_aSs $dExtensible_aSt irred_aSu e1_aFZ ds_d112 ->
      case ds_d112 of { SomeStruct @es_aSx irred1_aSy a1_aG0 ->
      SomeStruct
        ((irred_aSu, irred1_aSy) `cast` <Co:6> :: ...) 
        (setNext
           $dExtensible_aSt
           a1_aG0
           ((e1_aFZ, getNext $dExtensible_aSt a1_aG0) `cast` <Co:5> :: ...))
      }

The part that we want to avoid is this tuple allocation: (irred_aSu, irred1_aSy)

With new type families, this function results in this code:

extendSomeStruct
  = \ @a_aSU @e_aSV $dExtensible_aSW $d~_aSX eta_B0 eta1_B1 ->
      case eq_sel $d~_aSX of co_a10a { __DEFAULT ->
      case eta1_B1 of { SomeStruct @es_aT0 irred_aT1 a1_aG7 ->
      SomeStruct
        (irred_aT1 `cast` <Co:20> :: ...)
        (setNext
           $dExtensible_aSW
           a1_aG7
           ((eta_B0, getNext $dExtensible_aSW a1_aG7) `cast` <Co:5> :: ...))
      }
      }

(irred_aT1 cast :: ...)

There is no more tuple allocation. However, matching on equality constraint was added: case eq_sel $d~_aSX of co_a10a { __DEFAULT ->

I can't confidently predict which result is better, I'm just feeling that evaluation of a single unit is better that lazy pair allocation.

s-and-witch commented 3 months ago

Where does that leave us wrt "reducing possible overhead"?

I mean runtime overhead for tuple allocation

alt-romes commented 3 months ago

I see. It looks like a benign change user-facing modulo error messages, but also not trivial to judge (I'll try pointing a project of mine at this branch later). The generated code allocates less, that's good...

Do you mind if I ask what prompted you investigating this/patch this on vulkan in particular?

s-and-witch commented 3 months ago

Another example require simulating cross-module inlining failure, e.g. when there is a function like createInstance that expects Extends p xs. I'm using O0 and NOINLINE to demonstrate this case: https://play.haskell.org/saved/gy5AghRM

With new API check call looks like this:

main = print $fShow() (check_rhm ((%%) `cast` <Co:42> :: ...))

And here is how it looks with old one:

$d(%,%)_rRT = ((%%), (%%))

$d(%,%)1_rRU = ((%%), $d(%,%)_rRT)

main
  = print $fShow() (check_rhk ($d(%,%)1_rRU `cast` <Co:33> :: ...))
s-and-witch commented 3 months ago

Do you mind if I ask what prompted you investigating this/patch this on vulkan in particular?

There is no any major reason, I wanted to learn about vulkan by porting code a C++ guide into Haskell and found how Extendss is implemented. There was no any investigation, I just decided to reduce possible (rather small) overhead of this constraint.

s-and-witch commented 3 months ago

FWIW, I'm totally ok with rejecting this patch, because I expected a little patch that does a little speed improvement, however the patch grew a lot.