dart-lang / language

Design of the Dart language
Other
2.68k stars 205 forks source link

Problem with hierarchical generic types #3888

Open MichaelFenwick opened 5 months ago

MichaelFenwick commented 5 months ago

Intro

I'm honestly not entirely sure if this is bug, language limitation, or just me not understanding or doing something right. I'm guessing that I'm probably doing something wrong, in which case I'd appreciate advice on what the "correct" way to go about it in Dart is, but much of what I'm encountering feels like it should work, so maybe there is space for the language to add something to help with this type of thing.

I ask that you please try to bear with me, I'll try to make it as succinct as possible, but I think some context around what the code is trying to do will help in describing the issue. Also note that I've tried to remove as much application logic as I think is reasonable without hurting understanding here, leaving only the structural parts, but I can add more if needed. Alternatively, the full codebase can be found at https://github.com/MichaelFenwick/WomensScriptTransliterator

I have a Dart application which parses some text, recursively breaking it into smaller linguistic StringUnits (like Paragraph, Sentence, Word) and then passes these StringUnits into methods on unit-specific transliterator classes (ParagraphTransliterator, SentenceTransliterator, etc.) for processing. The StringUnit subclasses are hierarchical, with each subclass having a SuperUnit and SubUnit. The idea of the hierarchy is to be able to define superclass methods where the parameters and/or return values are typed as SuperUnit or SubUnit of the inheriting class. As a final bit of context, the text I'm parsing can have each StringUnit broken into multiple pieces (for example, if sourced from HTML where a sentence may be split across multiple elements). To handle this, there is also the concept of an Atom which represents one of these pieces and also holds a reference to that pieces original source (A collection of Atom objects would represent a full sentence, for example).

Code

The way I've implemented this hierarchy is as follows:

StringUnit

abstract class StringUnit {
  final String content;

  StringUnit(this.content);

  factory StringUnit.build(Type U, String content) {
    switch (U) {
      case TextBlock:
        return TextBlock(content);
      case Paragraph:
        return Paragraph(content);
      case Sentence:
        return Sentence(content);
      case Word:
        return Word(content);
      default:
        throw TypeError();
    }
  }

  U cast<U extends StringUnit>() => StringUnit.build(U, content) as U;

Superunit

mixin Superunit<S extends StringUnit> implements StringUnit {}

Subunit

mixin Subunit<S extends StringUnit> implements StringUnit {}

TextBlock

class TextBlock extends StringUnit with Superunit<Paragraph> {
  TextBlock(String content) : super(content);
}

ParagraphBlock

class Paragraph extends StringUnit with Superunit<Sentence>, Subunit<TextBlock> {
  Paragraph(String content) : super(content);
}

Sentence

class Sentence extends StringUnit with Superunit<Word>, Subunit<Paragraph> {  
  Sentence(String content) : super(content);
}

Word

class Word extends StringUnit with Subunit<Sentence> {
  Word(String content) : super(content);
}

These hierarchical classes are used for the content of Atoms, and then used with matching Transliterator classes (note that the Transliterator classes have members which have been removed here as they're not relevant). The Transliterator class methods package their return values in a Result class.

Atom

class Atom<E, X> {
  final E content;
  final X context;

  const Atom(this.content, this.context);
}

Result

class Result<E, S extends Script, T extends Script> {
  final E source;
  final E target;

  const Result(E source, E target);

  Result<F, S, T> cast<F>(F Function(E e) caster) {
      return Result<F, S, T>(caster(source), caster(target));
  }
}

Transliterator

abstract class Transliterator<E, S extends Script, T extends Script> {
  FutureOr<Result<E, S, T>> transliterate(E input);

  Iterable<FutureOr<Result<E, S, T>>> transliterateAll(Iterable<E> inputs) =>
      inputs.map((E input) => transliterate(input);
}

There's a StringTransliterator class that extends Transliterator which the following classes extend. I'll give code for the StringTranslitator following its subclasses (as it's that class where my issues primarily lie, and I don't want to bury that class in the middle of all this).

TextBlockTransliterator

class TextBlockTransliterator<S extends Script, T extends Script> extends StringTransliterator<TextBlock, S, T>
    with SuperUnitStringTransliterator<TextBlock, S, T> {
  TextBlockTransliterator() : super();

  static TextBlockTransliterator<S, T> fromTransliterator<S extends Script, T extends Script>(Transliterator<dynamic, S, T> transliterator) => TextBlockTransliterator<S, T>();

  ParagraphTransliterator<S, T> getSubtransliterator() => ParagraphTransliterator.fromTransliterator<S, T>(this);

  TextBlock buildUnit(String string) => TextBlock(string);
}

ParagraphTransliterator

class ParagraphTransliterator<S extends Script, T extends Script> extends StringTransliterator<Paragraph, S, T>
    with SuperUnitStringTransliterator<Paragraph, S, T> {
  ParagraphTransliterator() : super();

  static ParagraphTransliterator<S, T> fromTransliterator<S extends Script, T extends Script>(Transliterator<dynamic, S, T> transliterator) => ParagraphTransliterator<S, T>();

  Iterable<Result<Atom<Paragraph, X>, S, T>?> transliterateAtoms<X>(Iterable<Atom<StringUnit, X>?> unitAtoms) {
    // Removed Application Logic
    return super.transliterateAtoms(unitAtoms);
  }

  SentenceTransliterator<S, T> getSubtransliterator() => SentenceTransliterator.fromTransliterator<S, T>(this);

  Paragraph buildUnit(String string) => Paragraph(string);
}

SentenceTransliterator

class SentenceTransliterator<S extends Script, T extends Script> extends StringTransliterator<Sentence, S, T>
    with SuperUnitStringTransliterator<Sentence, S, T> {
  SentenceTransliterator() : super();

  static SentenceTransliterator<S, T> fromTransliterator<S extends Script, T extends Script>(Transliterator<dynamic, S, T> transliterator) =>
      SentenceTransliterator<S, T>();

  Iterable<Result<Atom<Sentence, X>, S, T>?> transliterateAtoms<X>(Iterable<Atom<StringUnit, X>?> unitAtoms) {
    // Removed application logic

    return super.transliterateAtoms(unitAtomsList.map((Atom<StringUnit, X>? atom) => Atom(Sentence(atom!.content.content), atom.context)));
  }

  WordTransliterator<S, T> getSubtransliterator() => WordTransliterator.fromTransliterator(this);

  Sentence buildUnit(String string) => Sentence(string);
}

WordTransliterator

class WordTransliterator<S extends Script, T extends Script> extends StringTransliterator<Word, S, T> {
  WordTransliterator() : super();

  static WordTransliterator<S, T> fromTransliterator<S extends Script, T extends Script>(Transliterator<dynamic, S, T> transliterator) => WordTransliterator<S, T>();

  Result<Word, S, T> transliterate(Word input) {
    // Removed application logic
    return Result<Word, S, T>(input, output);
  }

  Word buildUnit(String string) => Word(string);
}

Finally, we get to StringTransliterator. This is where the bulk of the application logic lives, and where the type system stops working the way I'd want/expect.

StringTransliterator and SuperUnitStringTransliterator

typedef AtomResult<U extends StringUnit, X, S extends Script, T extends Script> = Result<Atom<U, X>, S, T>;
typedef SubTrans<U extends StringUnit, S extends Script, T extends Script> = StringTransliterator<Subunit<U>, S, T>;
typedef SubResult<U extends StringUnit, S extends Script, T extends Script> = Result<Subunit<U>, S, T>;
typedef SubAtom<U extends StringUnit, X> = Atom<Subunit<U>, X>;
typedef SubAtomResult<U extends StringUnit, X, S extends Script, T extends Script> = Result<Atom<Subunit<U>, X>, S, T>;
typedef Matrix<E> = List<List<E>>;

abstract class StringTransliterator<U extends StringUnit, S extends Script, T extends Script> extends Transliterator<U, S, T> {
  StringTransliterator() : super();

  U buildUnit(String string);

  U sourceReducer(U a, U b) => buildUnit('$a$b');

  U targetReducer(U a, U b) => buildUnit('$a$b');

  Result<U, S, T> transliterate(U input, {bool useOutputWriter = false});

  Iterable<Result<U, S, T>> transliterateAll(Iterable<U> inputs, {bool useOutputWriter = false}) =>
      inputs.map((U input) => transliterate(input, useOutputWriter: useOutputWriter));
}

mixin SuperUnitStringTransliterator<U extends StringUnit, S extends Script, T extends Script> on StringTransliterator<U, S, T> {
  SubTrans<U, S, T> getSubtransliterator();

  Subunit<U> buildSubunit(String string) => getSubtransliterator().buildUnit(string);

  Result<U, S, T> transliterate(U input, {bool useOutputWriter = false}) => splitMapJoin(input);

  Result<U, S, T> splitMapJoin(
    U input, {
    SubResult<U, S, T> Function(String nonMatch)? onNonMatch,
    SubResult<U, S, T> Function(Match match)? onMatch,
  }) {
    final SubTrans<U, S, T> subtransliterator = getSubtransliterator();

    Iterable<SubResult<U, S, T>> results() sync* {  // Removed application logic }

    return Result.join<Subunit<U>, S, T>(
      results(),
      sourceReducer: subtransliterator.sourceReducer,
      targetReducer: subtransliterator.targetReducer,
    ).cast<U>((Subunit<U> subunit) => subunit.cast<U>());
  }

  Iterable<AtomResult<U, X, S, T>?> transliterateAtoms<X>(Iterable<Atom<StringUnit, X>?> unitAtoms) {
    final SubTrans<U, S, T> subtransliterator = getSubtransliterator();

    // If the subtransliterator of this transliterator is a superunit, then we will recurse down into its transliterateAtoms method.
    if (subtransliterator is SuperUnitStringTransliterator<Subunit<U>, S, T>) {
      final Matrix<SubAtom<U, X>?> subunitUnitMatrix = _breakUnitAtomsIntoSubunitUnitMatrix<X>(unitAtoms);
      final Matrix<SubAtomResult<U, X, S, T>?> transliteratedUnitSubunitAtomMatrix = _transliterateSubunitUnitMatrix<X>(subunitUnitMatrix, subtransliterator);
      final Iterable<AtomResult<U, X, S, T>?> unitAtomResults =
          _joinTransliteratedSubunitResultAtomsIntoUnitResultAtoms<X>(transliteratedUnitSubunitAtomMatrix, subtransliterator);

      return unitAtomResults;
    }

    // But otherwise we're at the lowest level of unit, and if so, just call the basic transliterate() method.
    else {
      return unitAtoms.map<AtomResult<U, X, S, T>?>(
        (Atom<StringUnit, X>? unitAtom) =>
            unitAtom != null ? splitMapJoin(unitAtom.content as U).cast<Atom<U, X>>((StringUnit unit) => Atom<U, X>(unit as U, unitAtom.context)) : null,
      );
    }
  }

  Matrix<SubAtom<U, X>?> _breakUnitAtomsIntoSubunitUnitMatrix<X>(Iterable<Atom<StringUnit, X>?> unitAtoms) {  // Removed application logic  }

  Matrix<SubAtomResult<U, X, S, T>?> _transliterateSubunitUnitMatrix<X>(
    Matrix<SubAtom<U, X>?> subunitUnitMatrix,
    SuperUnitStringTransliterator<Subunit<U>, S, T> subtransliterator,
  ) {  // Removed application logic  }

  List<AtomResult<U, X, S, T>?> _joinTransliteratedSubunitResultAtomsIntoUnitResultAtoms<X>(
    Matrix<SubAtomResult<U, X, S, T>?> transliteratedUnitSubunitAtomMatrix,
    SuperUnitStringTransliterator<Subunit<U>, S, T> subtransliterator,
  ) {  // Removed application logic }
}

Problems

With all that out of the way, here are the problems I'm running into:

Problem 1

I'm unable to make StringTransliterator.transliterateAtoms() have as strict a type constraint as I'd like. Ideally this method would take a parameter of type Iterable<Atom<U, X>?> rather than Iterable<Atom<StringUnit, X>?>. If I change the code to have this preferred signature, everything looks fine in the static analysis, but actually running the code gives the following error message:

Unhandled exception:
type 'List<Atom<Subunit<TextBlock>, XmlText>?>' is not a subtype of type 'Iterable<Atom<Paragraph, XmlText>?>' of 'unitAtoms'
#0      ParagraphTransliterator.transliterateAtoms 
#1      SuperUnitStringTransliterator._transliterateSubunitUnitMatrix 
#2      SuperUnitStringTransliterator.transliterateAtoms 
#3      <Pointer to code like
            final Iterable<Atom<TextBlock, XmlText>> atoms = [...];
            textBlockTransliterator.transliterateAtoms<XmlText>(atoms);

This seems wrong to me as Paragraph is defined as class Paragraph extends StringUnit with Superunit<Sentence>, Subunit<TextBlock> { [...] }, and to my eyes Subunit<TextBlock> should be a valid type to use for something expecting Paragraph.

Problem 2

If placed within StringTransliterator.transliterateAtoms, the line final Subunit<Superunit<U>> unit = buildUnit(''); gives the error error: A value of type 'U' can't be assigned to a variable of type 'Subunit<Superunit<U>>'. (invalid_assignment) which again seems wrong to me, as the subunit of a superunit should just be the unit again.

Problem 3

ParagraphTransliterator.transliterateAtoms() has to have signature of Iterable<Result<Atom<Paragraph, X>, S, T>?> transliterateAtoms<X>(Iterable<Atom<StringUnit, X>?> unitAtoms) to match the StringTransliterator.transliterateAtoms() signature. Ideally, this method would instead have the signature Iterable<Result<Atom<Paragraph, X>, S, T>?> transliterateAtoms<X>(Iterable<Atom<Paragraph, X>?> unitAtoms) (as the ParagraphTransliterator should never being trying to process Atoms of anything other than Paragraphs). If I could change the StringUnit to U in the StringTransliterator.transliterateAtoms() definition, then I would be able to (and indeed be forced to) have this stronger constraint in the ParagraphTransliterator override of it. As it is now though, I seem to be stuck with an overly permissive type constraint. Note that this issue exists for the other StringTransliterator subclasses as well.

Problem 4

The following code,

    final Subunit<U> foo = StringUnit.build(Subunit<U>, '') as Subunit<U>;
    final StringUnit bar = foo.cast<Subunit<U>>();

when placed in a class which has generics including U extends StringUnit, results in this error when run:

Unhandled exception:
Instance of 'TypeError'
#0      new StringUnit.build
#1      <Pointer to the above code fragment that called StringUnit.cast()>

Problem 5

Finally, I had to add a lot of ugly boilerplate to facilitate instantiation of the StringUnit subclasses. In an ideal world, I'd be able to do something like U unit = U('content'). As that's not something the language supports, I instead have to rely on each StringTransliterator subclass providing a buildUnit() method to construct the appropriate StringUnit subclass instance, or the StringUnit.build() factory constructor in cases where I don't have the context of a StringTransliterator subclass. It'd be nice if Dart provided an easier way to create these types of objects.

Conclusion

First, thank you getting this far, I truly appreciate your time. Like I said at the start, I don't know how much of this is me doing something wrong, me trying to do something the language simply doesn't support, or places where the language could be fixed/improved to facilitate things. Hopefully you'll be able to help me figure out which is which, and advise me on a proper alternative where applicable.

Beyond the primary issues, I also have concern over the fact that when I attempt to tighten the type constraints, I encounter errors when trying to run the code, but get no errors from the static analysis. One of the things I enjoy most about Dart is the robustness of its static analysis, so encountering these errors is particularly frustrating.

mmcdon20 commented 5 months ago

I'm not going to address everything in this post, but:

to my eyes Subunit<TextBlock> should be a valid type to use for something expecting Paragraph.

This is like saying "num should be a valid type to use for something expecting int". Subunit<TextBlock> is the supertype, and Paragraph is the subtype. You can use a subtype for something expecting the supertype but not the other way around.

Ing-Brayan-Martinez commented 5 months ago

I think you should review the hierarchy of your classes. Have you tried using the new class access modifiers? I recommend using the sealed modifier. Check the documentation to understand how to implement it.

Levi-Lesches commented 5 months ago

At first glance, it seems your issues would be solved with some use-site (#753) or declaration-site (#524) variance. I'd recommend giving both of those issues a :+1: if so.

Secondly, in big and complex issues like this, it's usually best to try to reduce your problem to a minimally reproducible example that doesn't require the reader to understand your exact use-case and contexts. As @mmcdon20 mentioned, your first problem could be something as simple as:

void testInt(int x) { }
void main() {
  num x = 0;
  testInt(x);  // error
}

Because in your example, Paragraph is a subtype of Subunit, just like int is a subtype of num. Finding more universal examples like this would help people understand your issues because they're simpler and more familiar to us.


For problem 2, let's look at your definitions again:

mixin Superunit<S extends StringUnit> implements StringUnit {}
mixin Subunit<S extends StringUnit> implements StringUnit {}

Sure, in English, these mixins represent the idea of sub- and super- units, which is why semantically, you might say SubUnit<SuperUnit<T>> == T. But I don't see that being expressed in your code. That's kind of like saying:

final one = "1";
final two = "2";
final three = "3";

and then being surprised that one + two != three. It's not clear to me what exact relationships you expect to exist between your sub- and super-units, or whether that is something you can express at compile-time or need to check at runtime.


For problems 3 and 4 it would be nice if you could come up with simpler examples that illustrate the Dart-specific issues you're hitting, without us needing to read and understand the rest of the context.

For problem 5, I'm not sure what your exact problem is, but maybe check out #356?

Beyond the primary issues, I also have concern over the fact that when I attempt to tighten the type constraints, I encounter errors when trying to run the code, but get no errors from the static analysis. One of the things I enjoy most about Dart is the robustness of its static analysis, so encountering these errors is particularly frustrating.

This is a good example of a concrete, actionable request, and again I'd suggest you check out #753 and #524.