Clash error call: `zipEqual: left list is longer` when using custom type proof

martijnbastiaan commented 1 year ago

The following:

module Test where

import Clash.Prelude
import Data.Constraint
import Data.Proxy
import Unsafe.Coerce

data T depth = T (BitVector depth) deriving (Generic)

instance (1 <= CLog 2 depth, KnownNat depth) => NFDataX (T depth)

-- | if (2 <= n) holds, then (1 <= CLog 2 n) also holds.
oneLeCLog2n :: forall n . (2 <= n) => Proxy n -> Dict (1 <= CLog 2 n)
oneLeCLog2n Proxy = unsafeCoerce (Dict :: Dict ())

f ::
  forall dom depth.
  ( HiddenClockResetEnable dom
  , KnownNat depth
  , 2 <= depth ) =>
  Proxy depth ->
  Signal dom Bool ->
  Signal dom Bool
f Proxy =
  case oneLeCLog2n (Proxy @depth) of
    Dict -> mealy go (T 0)

 where
  go :: T depth -> Bool -> (T depth, Bool)
  go (T n) True = (T (n + 1), False)
  go (T n) False = (T (n - 1), True)
{-# NOINLINE f #-}

topEntity clk rst ena =
  withClockResetEnable clk rst ena $
    f @System @2 Proxy
{-# NOINLINE topEntity #-}

causes:

<no location info>: error:
    Clash error call:
    zipEqual: left list is longer
    CallStack (from HasCallStack):
      error, called at src/Data/List/Extra.hs:124:19 in clash-lib-1.7.0-inplace:Data.List.Extra
      zipEqual, called at src/Clash/Normalize/Transformations/Case.hs:203:19 in clash-lib-1.7.0-inplace:Clash.Normalize.Transformations.Case
      caseCon', called at src/Clash/Normalize/Transformations/Case.hs:149:32 in clash-lib-1.7.0-inplace:Clash.Normalize.Transformations.Case
      caseCon, called at src/Clash/Normalize/Strategy.hs:91:36 in clash-lib-1.7.0-inplace:Clash.Normalize.Strategy

In other examples Clash introduces free variables in caseCon (but I've yet to make a small reproducer for that).

Printing the arguments given to zipEqual gives:

xs1: [ Id
    { varName =
        Name
          { nameSort = Internal
          , nameOcc = "c$sel"
          , nameUniq = 2
          , nameLoc = UnhelpfulSpan UnhelpfulNoLocationInfo
          }
    , varUniq = 2
    , varType =
        AppTy
          (AppTy
             (AppTy
                (AppTy
                   (ConstTy
                      (TyCon
                         Name
                           { nameSort = User
                           , nameOcc = "GHC.Prim.~#"
                           , nameUniq = 3674937295934324842
                           , nameLoc = UnhelpfulSpan UnhelpfulNoLocationInfo
                           }))
                   (ConstTy
                      (TyCon
                         Name
                           { nameSort = User
                           , nameOcc = "GHC.Types.Bool"
                           , nameUniq = 3674937295934324744
                           , nameLoc =
                               RealSrcSpan SrcSpanMultiLine "Test.hs" 16 1 23 18 Nothing
                           })))
                (ConstTy
                   (TyCon
                      Name
                        { nameSort = User
                        , nameOcc = "GHC.Types.Bool"
                        , nameUniq = 3674937295934324744
                        , nameLoc =
                            RealSrcSpan SrcSpanMultiLine "Test.hs" 16 1 23 18 Nothing
                        })))
             (AppTy
                (AppTy
                   (ConstTy
                      (TyCon
                         Name
                           { nameSort = User
                           , nameOcc = "GHC.TypeNats.<=?"
                           , nameUniq = 3674937295934325074
                           , nameLoc =
                               RealSrcSpan SrcSpanMultiLine "Test.hs" 16 1 23 18 Nothing
                           }))
                   (LitTy (NumTy 1)))
                (AppTy
                   (AppTy
                      (ConstTy
                         (TyCon
                            Name
                              { nameSort = User
                              , nameOcc = "GHC.TypeLits.Extra.CLog"
                              , nameUniq = 8214565720323785211
                              , nameLoc =
                                  RealSrcSpan SrcSpanMultiLine "Test.hs" 16 1 23 18 Nothing
                              }))
                      (LitTy (NumTy 2)))
                   (LitTy (NumTy 2)))))
          (ConstTy
             (TyCon
                Name
                  { nameSort = User
                  , nameOcc = "GHC.Types.True"
                  , nameUniq = 3891110078048108586
                  , nameLoc =
                      RealSrcSpan SrcSpanMultiLine "Test.hs" 16 1 23 18 Nothing
                  }))
    , idScope = LocalId
    }
]
left args: []

Workaround: add NOINLINE to oneLeCLog2n

christiaanb commented 1 year ago

Looking at the log, the culprit is quite clearly the unsafeCoerce (Dict :: Dict ()):

case GHC.Classes.(%%)[7854277750134145024] of
  GHC.Types.Eq#[3891110078048108571]
    (c$sel[2] :: GHC.Prim.~#[3674937295934324842]
                   GHC.Types.Bool[3674937295934324744]
                   GHC.Types.Bool[3674937295934324744]
                   (Data.Type.Ord.OrdCond[8214565720323786066]
                      GHC.Types.Bool[3674937295934324744]
                      (Data.Type.Ord.Compare[8214565720323786076]
                         GHC.Num.Natural.Natural[3674937295934324786]
                         1
                         (GHC.TypeLits.Extra.CLog[8214565720323786119]
                            2
                            2))
                      GHC.Types.True[3891110078048108586]
                      GHC.Types.True[3891110078048108586]
                      GHC.Types.False[3891110078048108556])
                   GHC.Types.True[3891110078048108586]) ->
    c$sel[2][LocalId]

which comes from:

  case oneLeCLog2n (Proxy @depth) of
    Dict -> mealy go (T 0)

i.e. the type of the case subject should be something of an equality constraint, but the unsafeCoerce gave us a GHC.Classes.(%%).

martijnbastiaan commented 1 year ago

To conclude this issue: this is really a PBKAC. The first type argument of Dict is pretty important and needs to conform to your proof. E.g., the proof introduced in the first message should read:

oneLeCLog2n :: forall n . (2 <= n) => Proxy n -> Dict (1 <= CLog 2 n)
oneLeCLog2n Proxy = unsafeCoerce (Dict :: Dict (1 <= 1))

leonschoorl commented 1 year ago

Or we could make caseCon a little more careful

leonschoorl commented 1 year ago

It seemed easy it to look at dcUniqs instead of dcTags to fix #2376.

But on second thought this "fix" (silently) breaks other uses of unsafeCoerce, which used to work before.

data AB a b = A a    | B b
data CD     = C Bool | D Int

topEntity :: Int
topEntity = case unsafeCoerce ab of
  C _ -> 123
  D i -> i
 where
  ab :: AB Bool Int
  ab = B 4

caseCon now turns this into undefined because none of the alternatives match.

martijnbastiaan commented 1 year ago

We shouldn't try and detect this; unsafeCoerce is really when people want to do incredibly unsafe stuff. If they do it incorrectly (like showcased in this issue) they can expect anything from weird errors to segmentation faults. Based on that I think we should close this issue.

We should handle coercions between AB and CD gracefully though. But from the sounds of it we're already doing that. (?) If we're not that's probably grounds for opening an issue. (But TBF, I don't think it's particularly high priority. It's unsafe for a reason.)

alex-mckenna commented 1 year ago

Or we could make caseCon a little more careful

I don't think it makes sense to change how caseCon works, since the problem here is that the term level scrutinee did not line up with what the type said it should be. This only happens when

using things which are unsafeCoerce and not being careful
a transformation before caseCon is incorrect and has not preserved type-safety

I think a better idea would be to improve the error here, since the main problem other than programmer error was Clash not actually telling us what was wrong directly. We can inferCoreTypeOf the scrutinee of the expression, and get the types of patterns by examining

the DataCon and [TyVar] for DataPat
the Literal for LitPat

We can then say that

all patterns should have the same codomain
this type should be the same as the inferred type of the scrutinee

Note that this is equality is possibly up to normalizeType, i.e. type families may appear and ruin your day.

alex-mckenna commented 1 year ago

Better error messages aside, we may not want to do this when we don't have -fclash-debug-invariants (or in old money, DebugSilent), since the check might be somewhat expensive to apply on every call to caseCon

martijnbastiaan commented 1 year ago

Linting this seems fine to me (we already do that with other invariants), but I don't see the point in tying it to caseCon. Any transformation can introduce type errors (and this error for that matter), so why wouldn't we introduce a general core lint rule that checks whether types conform to their inferred ones? Expensiveness shouldn't matter that much given that it is a debug option and it would only run after applying a transformation.

alex-mckenna commented 1 year ago

it would only run after applying a transformation

We can't simply run this after a transformation: if caseCon is the first transformation applied it would miss it completely and we're back to an unhelpful error message. This check would have to be before a transformation (even more pedantically, before the first transformation and before any transformation where the last transformation changed the term). All our other invariants are simple postconditions on changed terms.

but I don't see the point in tying it to caseCon

The coupling to caseCon makes sense (at least to me), because this is the only place where we attempt to match a subject to it's patterns. I would think about this analogously to the check in applyTypeToArgs where we identify something is wrong on demand and just blow up.

Expensiveness shouldn't matter that much given that it is a debug option

Having thought about this more, I don't think it should be a debug only check. If this check would fail, the core is just straight up wrong. I don't think there's much value in trying to generate hardware from a program which is known to be incorrect. If a transformation is wrong, better to know sooner. If the programmer is using unsafeCoerce, sometimes blowing up in their face at compile-time when they use it wrong is a similar experience to GHC.

Although to go back to the comment on cost not mattering since it's debug: even for debugging I would want such expensive checks to be optionally enabled because it would really slow down the debugging cycle for the programmer. This applies more for things like this where they likely happen infrequently since unsafeCoerce is probably the most common way for these problems to appear.

martijnbastiaan commented 1 year ago

We can't simply run this after a transformation:

Sure, let's run it at the very start too. That shouldn't matter for performance given that it's a one time thing. Come to think of it, we should definitely do this: ~~it would catch unsafeCoerce abuse without introducing any penalty~~. Scrap that; it probably wouldn't.

because this is the only place where we attempt to match a subject to it's patterns

I'm not really sure how this is relevant. Other transformations can introduce wrongly typed cases - and you'd like the error to be raised as close to the source introducing it as possible.

I would think about this analogously to the check in applyTypeToArgs where we identify something is wrong on demand and just blow up.

I think this really comes down to: do you want to pay the cost of checking it? applyTypeToArgs blows up because there it needs to do the check anyway. In the past we've made the decision to not check invariants in applyDebug because Clash becomes unbearably slow for production (turning 20 minute runs into multiple hour ones), even though any of the broken invariants point to very serious Clash defects. I can't imagine this check -even if applied to just caseCon- is going to be particularly performance friendly either. But I'm happy to be proven wrong.

kleinreact commented 3 days ago

I just ran into this issue again, but this time via using some of the dictionaries of Data.Constraint.Nat. It took me some time to figure out that the problem is not with me using unsafeCoerce incorrectly, but is introduced by the library, as they use unsafeAxiom under the hood, which suffers from the same already mentioned inconsistency.

I agree that this is a PBKAC, but users easily get trapped into this issue by using libraries like constraints. And if the issue gets introduced by an external library, it turns out to be quite hard to determine its origin. The introduction of free variables or zipEqual: left list is longer error messages (if clash-lib has been built with the debug flag) are not really helpful at this point either.

clash-lang / clash-compiler

Clash error call: `zipEqual: left list is longer` when using custom type proof #2376