Frege / frege

Frege is a Haskell for the JVM. It brings purely functional programing to the Java platform.
https://github.com/Frege/frege/wiki/_pages
Other
3.64k stars 145 forks source link

Member functions should be resolved at call sites #387

Open matil019 opened 4 years ago

matil019 commented 4 years ago

The compiler translates methods of an instance so that they can be called as if they were members of the data type. For example, this is legal:

data Foo = Foo Int
  where
    dataMember :: Foo -> Int
    dataMember (Foo x) = x

class Bar a where
    instMember :: a -> String

instance Bar Foo where
    instMember (Foo x) = show x

useFoo foo = do
    -- OK
    println foo.dataMember
    -- also OK
    println foo.instMember

I believe this is done in enter1InsDcl.

This translation (or "re-sugaring"?) comes in handy and well mixes with the syntaxes of records or Java's. However, it becomes a problem if there are name conflicts. If instMember, in the above example, had been named dataMember, then it is a compilation error at the definition of the instance.

I don't know any workaround for this issue other than renaming either of the data member or the class member. qualified imports doesn't work because the error occurs at instance definitions, not at use sites.

The compiler should defer this transformation until such calls are encountered so that the following program would be legal:

import Foo (Foo)
import Bar ()

useFoo' foo = do
    println $ Foo.dataMember foo
    println $ Bar.dataMember foo
    -- ambiguous error (Foo or Bar?)
    -- println foo.dataMember
    -- ambiguous error if both 'dataMember' are imported unqualified
    -- println $ dataMember foo
Ingo60 commented 4 years ago

This works as designed. It is so that you can write

data X = Foo ...  where
   show (Foo ...) = "foo"

instance Show Foo

Sure, it is a restriction, but it precisely avoids the ambiguities that would arise if it was allowed.

matil019 commented 4 years ago

Nice to hear that it is working as intended, but still I think it should be changed.

Consider the following scenario:

  1. Someone writes a package hello that has a class Hello a with its member sayHello :: a -> IO ().
  2. Meanwhile, another author writes a package foo and adds a data type Foo which has a member sayHello :: Foo -> String.
  3. The author of hello notices the foo package and wants to add instance Hello Foo but gets stuck, because changing the type of Hello.sayHello breaks other instances.

In fact, the same story applies even when both sayHello have the same type but different semantics.

This "restriction" is a defect that destroys namespacing provided by the module system. Two functions with the same name in different modules are different functions. Users can differentiate them by qualified imports. It is natural to expect the same for methods, however, this freedom is disallowed by the (mis-)feature.

As I tried to describe in my first post, they are ambiguous only when called in the value.method form, and should be treated as such.

As for your instance Show Foo example, the data type method should be treated like default implementation in Haskell's and should be "overridable" (or "shadowable" is a better term?), because Foo.show and (Show Foo).show (should be able to) refer different functions.

data Foo = Foo ... where
    show (Foo ...) = "foo"

class Show a where
    show :: a -> String

    -- how the default implementation added by the Frege compiler
    -- would look like in Haskell's syntax
    default show :: HasShow a => a -> String
    show x = x.show

instance Show Foo where
    show (Foo ...) = "Foo ..." -- Foo.show is overridden (shadowed)

Well, writing a function named show other than a method of class Show is a bad idea. However for more exotic and/or ad-hoc classes, this is important. For example, methods all named render that renders to HTML, XML, markdown, etc. may be provided by data members or various class methods.

Ingo60 commented 4 years ago

Ok, you can give it a try. Here's an outline.

data T = C1 … | C2 ... where
   show (C1 ...) = "data C1"
   show (C2 ...) = "data C2"
instance Show T where show x = "instance"

Principles:

How to achieve:

In Classes.fr, there is a function funForCIT that moves or links instance methods to the symbol table of the type. In the case of a name clash between instance and type, it should be possiblle to copy the instance function under a fresh name to the type's symbol table and then link the instance method to that new name (or the other way around when it is not "our" type). This should be it mostly. (Maybe a warning would be in order, though?)

Question: how should it be handled when there is a default definition in the class but none in the instance? I propose to leave it as is, namely to take the existing type method as implementation (which will break, when the type doesn't fit).

Perhaps a correction is needed for statically known instances, where the type checker replaces C.hello with T.hello or I.hello when the context is C T => and T is an instance I of the class C defining hello. (This is done in substInst in Typecheck.fr)

Here care must be taken to use the correct QName (to point whereever the actual method is located), if not done correctly already.

Another point is class hierarchies. For example

instance Eq Foo where f1 == f2 = ...

will silently make a Foo.!= and when we later see

instance Ord Foo where
     f1 <=> f2 = .....
     f1 != f2 = ....

we would have to overwrite (not overload) the !=
(But I'm not even sure this is actually allowed.)

If this works, we should also be able to make any type an instance of two or more different classes that happen to define the same method name!

I would do it myself, but I'd first want to make your Symbol, Tau and Rho fixes work.

Ingo60 commented 4 years ago

There will be the problem, however, that it will not be the case that d.show will always mean T.show d for types with overloaded instance members. For example (using definitions from above):

 frobnicate d = case d.show of
      "data C1"   -> 42
      other -> case d of
              C2{} -> 40
              _      ->  41

(Contrieved, as always) Clearly, frobnicate (C1 ...) should give 42, but it would suddenly be 41.

The reason is how type checking for x.m expressions works:

This deferring and the second pass is necessary to avoid left-to-right bias in typechecking. In the example, the type of d is only revealed in the inner case expression. So, if we indeed check left to right, we see d.show when we don't know the type of d yet. Note that we could as well do the case alternatives first and the scrutinized expression last, but then we have just replaced the left-to-right bias with a perhaps even more confusing right-to-left bias. Hence the need for this delaying trick.

However, it turns out that when the proposal is implemented. the C.m x rule will fire, thus giving a different result than intented. What is worse, once we add a type signature to frobnicate the behaviour will change back, because then we do know the type of d right away!

In other cases, it would force us to add a type signature to avoid type erors, namely when the type of T.m x is different from C.m x. But this would be the lucky case as there would be no confusion at least.

Giving this a fourth thought: we could make it a sure thing by just typecheckig a function a second time once the type signature has been established in the first run. (Thus perhaps doubling the time for typechecking). But if we could find a way to tell whether such ambiguous constructs appeared we could re-check only where it is really needed (or maybe just stop and tell the user to supply he type signature or disambiguate)

matil019 commented 4 years ago

Thank you for your explanation. And wow... it's more complicated than I expected! I'll try it.

matil019 commented 4 years ago

Another point is class hierarchies. For example

instance Eq Foo where f1 == f2 = ...

will silently make a Foo.!= and when we later see

instance Ord Foo where
     f1 <=> f2 = .....
     f1 != f2 = ....

we would have to overwrite (not overload) the != (But I'm not even sure this is actually allowed.)

@Ingo60 It turned out the current state is worse than expected.

If an instance of a subclass is defined in the same module as the data, the compiler crashes. If defined in the same module as the class, the compiler silently ignores the overriding method. I don't know what happens if an instance is an orphan.

I'll open a new issue for concrete examples and dedicated discussion.

Ingo60 commented 4 years ago

Well, it looks like this issue gives us the opportunity to clean the things up a bit. :)

Matters are not that bad, as I explained in #389

Point is: we must not overload or override implementations of class members that originate from a super class of the class we're making an instance of.

However, we shall find a way to overload data functions or functions that are implementations of class members of an unrelated class.

I'm only afraid it will finally turn out that the cleanest solution would be to disallow the x.m notation for class members. This would break a lot of code, though.