agentm / project-m36

Project: M36 Relational Algebra Engine
The Unlicense
876 stars 47 forks source link

Feature/arbitrary tupleset #200

Closed agentm closed 6 years ago

agentm commented 6 years ago

I created this pull request from your branch so that I could review the code in context. Feel free to create pull requests, too.

agentm commented 6 years ago

I fixed #203, so the nested, arbitrary relation construction should be able to be coded at line 610 in Base.hs. @YuMingLiao, would you like to finish this work?

I merged the master into my version of your branch in this repository.

YuMingLiao commented 6 years ago

Sure! I will finish it. I am working on arbitrary DateTime now. I may need some time to understand git mechanism, ConstructedAtomType and TypeVariable.

Thanks for your support!

agentm commented 6 years ago

It occurred to me that the ConstructedAtomType Arbitrary instance will require access to the DatabaseContext since the AtomType alone does not contain the list of available value constructors. If you want to pass that instance on to me to finish, that's fine.

YuMingLiao commented 6 years ago

Good news! I have managed to create arbitrary ConstructedAtomType!

TutorialD (master/main): createarbitraryrelation b {a Hair} 5-10
TutorialD (master/main): :showexpr b
┌─────────────┐
│a::Hair      │
├─────────────┤
│White        │
│White        │
│Other "Sunny"│
│Other "Mary" │
│Black        │
│Other "Mary" │
│White        │
│Black        │
│Other "Ted"  │
└─────────────┘

But I still don't understand the usage of TypeVariable. Could you give me an example of that in TutorialD expression?

agentm commented 6 years ago

By the time arbitrary' sees the AtomType, all types should be fully resolved, so you shouldn't see any TypeVariables. I've been thinking about splitting AtomTypes into resolved and unresolved types, but I would need to do some refactoring first.

Thanks for your hard work! I'll test out your branch today.

YuMingLiao commented 6 years ago

You are welcome! Take your time, and thanks for the great project!

agentm commented 6 years ago

I fixed the attribute resolution via #205, so there are new possibilities parsing. In testing, I found a case that results in an error from within QuickCheck:

TutorialD (master/main): createarbitraryrelation y {spam Int, spam2 Interval Int} 3-100
ERR: UnhandledExceptionError "QuickCheck.elements used with empty list\nCallStack (from HasCallStack):\n  error, called at ./Test/QuickCheck/Gen.hs:186:15 in QuickCheck-2.10.0.1-ALuGysu7txTB2VnkPyDuTq:Test.QuickCheck.Gen"

Could you take a look?

I also added some basic tests for this feature: cabal test test-tutoriald.

YuMingLiao commented 6 years ago

It happens in arbitray' ConstructedAtomType, I assume Interval should be parsed as IntervalAtomType atomTy, but it is parsed as a ConsturctedAtomType.

TutorialD (master/main): createarbitraryrelation a {spam Interval Int} 3-100

result parsed by makeAttributeExprsP

[AttributeAndTypeNameExpr "spam" (ADTypeConstructor "Interval" [ADTypeConstructor "Int" []]) ()]

result while retrieving a TypeConstructor in arbitray' ConstructedAtomType

Just (ADTypeConstructorDef "Interval" ["a"],[])

I expected the TypeConstructor found at least have one DataConstructorDef (Is my assumption right? data TypeConstructorName = DataConstructorDef1 ...), but Interval has none. Hence while the program randomly choose a DataConstructorDef by elements, it cause this empty list error.

Why is it not parsed as IntervalAtomType? Is it deprecated? What is the difference between ADTypeConstructor and PrimitiveTypeConstructor?

I'll look into it more tomorrow.

YuMingLiao commented 6 years ago

I see now, in test, you expect Interval to be (ConstructedAtomType "Interval" (M.singleton "a" IntegerAtomType)

agentm commented 6 years ago

I see- this may be another primitive AtomType bug. I'll take a closer look.

YuMingLiao commented 6 years ago

Okay, I see that now.

basicTypeConstructorMapping :: TypeConstructorMapping
basicTypeConstructorMapping = primitiveTypeConstructorMapping ++
                              maybeTypeConstructorMapping ++
                              eitherTypeConstructorMapping ++
                              listTypeConstructorMapping ++
                              intervalTypeConstructorMapping

intervalTypeConstructorMapping is different from primitiveTypeConstructorMapping.

atomTypeForTypeConstructor tCons tConss tvMap = case findTypeConstructor (TC.name tCons) tConss of
  Nothing -> Left (NoSuchTypeConstructorError (TC.name tCons))
  Just (PrimitiveTypeConstructorDef _ aType, _) -> Right aType
  Just (tConsDef, _) -> do
          tConsArgTypes <- mapM (\tConsArg -> atomTypeForTypeConstructor tConsArg tConss tvMap) (TC.arguments tCons)
          let pVarNames = TCD.typeVars tConsDef
              tConsArgs = M.fromList (zip pVarNames tConsArgTypes)
          Right (ConstructedAtomType (TC.name tCons) tConsArgs)

So atomTypeForTypeConstructor thinks it is a ConstructedAtomType.

I noticed that primitiveTypeConstructorP is taken off. Now all the primitive types become a ADTypeConstructor first, then turned back when atomTypeForTypeConstructor. I guess that's what you said eariler about more possibility parsing.

So, should Interval be a primitiveType? It can't be in primitiveTypeMapping because it has an argument.

So it should be resolved form ConstructedAtomType TypeConstructorName TypeVarMap to IntervalAtomType AtomType in atomTypeForTypeConstructor

similar problem:

TutorialD (master/main): :showexpr relation{a Interval Int}{tuple{a interval(3,5,f,f)}}
ERR: AtomTypeMismatchError (ConstructedAtomType "Interval" (fromList [("a",IntAtomType)])) (IntervalAtomType IntegerAtomType)
agentm commented 6 years ago

Interval should ultimately become a ConstructedAtomType since it is clearly composed of primitives. I chose not to implement it that way because of some special syntax around it. For example, Interval does not have any visible data constructors and it prints and gets parsed (in CSV) specially. Since this current design confused me, I am working on making it a ConstructedAtomType.

YuMingLiao commented 6 years ago

After I make changes in atomTypeForTypeConstructor. Interval can be generated now.

TutorialD (master/main): createarbitraryrelation e {spam Interval Day} 3-5
[AttributeAndTypeNameExpr "spam" (ADTypeConstructor "Interval" [ADTypeConstructor "Day" []]) ()]
TutorialD (master/main): :showexpr e
┌───────────────────────┐
│spam::Interval Day     │
├───────────────────────┤
│[1858-12-06,1858-12-06)│
│(1858-10-20,1858-10-20)│
│(1858-12-16,1858-12-16]│
│(1858-11-11,1858-11-11]│
└───────────────────────┘

But as you can see, there is some logic error in generating an Interval...haha

YuMingLiao commented 6 years ago
TutorialD (master/main): createarbitraryrelation e {spam Interval Day} 3-5
TutorialD (master/main): :showexpr e
┌───────────────────────┐
│spam::Interval Day     │
├───────────────────────┤
│[1858-12-10,1858-10-20]│
│[1858-12-11,1858-11-18)│
│[1858-11-12,1858-12-05]│
└───────────────────────┘

it has two different values now.

Interval may need some restriction. The type in it can only be an instance of Comparable typeclass or something like that.

agentm commented 6 years ago

Yea, we'll need some Interval-specific code to make the arbitrary instance generate valid intervals. In the future, users may have to specify their own arbitrary instances.

Indeed, I'm feeling more aware that we need to support typeclasses for custom types. Such classes would be useful for import/export, validation, etc.

I should have a replacement Interval type ready tomorrow.

agentm commented 6 years ago

Sorry, @YuMingLiao, I think our work criss-crossed. I finalized the changes necessary to make IntervalAtomType no longer a primitive type. The work is on my version of your branch.

YuMingLiao commented 6 years ago

I see. Never mind. I'm just trying to understand more code.

Good job! Thanks.

Where should I look to fetch your version? Maybe I can try if Interval works as expected.

About typeclass, I feel the same way. If Atomtype for Interval can be gt or lt, Then it is not far from having a sorted DataFrame.

agentm commented 6 years ago

The Interval changes are on my arbitrary_tupleset branch. I can still reproduce the Gen error using relation{a Interval Int}. Which line in Arbitrary causes the failure?

YuMingLiao commented 6 years ago

It's elements in arbitrary (ConstructedType..., about line 80. it's because there is no DataConstructorDef for Interval. So elements dcDef choose from empty list and cause error . If Interval is ADTypeConstructor, shouldn't it be defined as data Interval a = Interval a a BoolAtomType BoolAtomType so it has at least one DataConstructor ?

I tried to add it.

intervalTypeConstructorMapping :: TypeConstructorMapping
--intervalTypeConstructorMapping = [(ADTypeConstructorDef "Interval" ["a"], [])]
intervalTypeConstructorMapping = [(ADTypeConstructorDef "Interval" ["a"],
                                   [DataConstructorDef "Interval" [DataConstructorDefTypeVarNameArg "a",
                                                                   DataConstructorDefTypeVarNameArg "a",
                                                                   ]])
                                 ]

But I can't add the BoolAtomType part because DataConstructorDefArg doesn't have primitiveAtomType

data DataConstructorDefArg = DataConstructorDefTypeConstructorArg TypeConstructor | 
                             DataConstructorDefTypeVarNameArg TypeVarName

(time to bed...good day/night!)

agentm commented 6 years ago

Ah, I understand. In this case, I chose not to include a public data constructor because that would allow construction of invalid intervals (such as Interval 1 0). Users must use the AtomFunction interval to construct Intervals which validates the values. This is similar to Haskell module exports/hiding.

Until we have typeclasses which would allow user-defined types to implement their own Arbitrary instances, I think I will add a special case for Intervals. I suspected a special case would be required anyway because dropping random values into an Interval is likely to create invalid values anyway.

What do you think? Is there a better approach?

agentm commented 6 years ago

I added a special case to generate valid Intervals in Arbitrary.hs. Let me know what you think.

YuMingLiao commented 6 years ago

Sorry I couldn't give instant feedback. I don't have strong opinion about this special Interval thing. It can wait. But you implemented it so quick. And It works as expected! That's brilliant! Thanks!

TutorialD (master/main): createarbitraryrelation b {spam Interval Int} 3-5
TutorialD (master/main): :showexpr b
┌──────────────────────────┐
│spam::Interval (a::Int)   │
├──────────────────────────┤
│Interval 11 30 False True │
│Interval -27 16 False True│
│Interval -22 -9 True True │
│Interval -8 1 False False │
└──────────────────────────┘

And recently I found fake It can generate readable and plausible things like names, id, address. Do you think it's worth implementing?

agentm commented 6 years ago

Fake could be interesting for end-users, but the default types are generic. We could certainly offer additional types as useful add-ons. I think the additional types would be more useful with typeclasses, though.

Are there any more tests you can think of that would be useful?

YuMingLiao commented 6 years ago

what do you mean by additional types? I thought ConstructedAtomType already serves additional type.

I was thinking something like ... createarbitraryrelation a {name Text, id Int, address Text, phoneNumber Text} 3-5 Then it will create a more readable, plausible relation than what the arbitrary' function can give now.

agentm commented 6 years ago

I thought we could offer a PhoneNumber type which would have an Arbitrary instance to create phone numbers and so on.

YuMingLiao commented 6 years ago

Oh, I see what you mean now. That makes sense. The additional type can have some more property related to reality like PhoneNumber format validation. and it can create arbitrary relation with additional types and let user make their own attribute names. createarbitraryrelation a {employeeName Name, companyID ID, phoneNumber PhoneNumber} 3-5

So it is worth a try, right? easily creating more user case relation to play with? Let me dive into this...

agentm commented 6 years ago

Add-on types would certainly be useful, but they don't need to be part of this feature branch.

I was thinking that there could be a central repository for add-on types and schemas so that they could be easily loaded dynamically at runtime, but that's a future feature.

YuMingLiao commented 6 years ago

Okay then, you are right. And it seems odd to add AddOn types to ConstructedAtomType. I guess I just want to create more relations to play with. :p ... So I guess this is it. I've committed the final work and push to my feature/arbitrarau_tupleset. I'll pull a request.

YuMingLiao commented 6 years ago

Okay, I guess I don't need to pull new request because this request is already following my branch.

agentm commented 6 years ago

I pulled from your branch but it no longer compiles. It looks like arbitrary' in Arbitrary.hs disappeared or was moved. Do you see the same?

YuMingLiao commented 6 years ago

Oops. I have git pushed again. This time should be fine. Sorry for the inconvenience.

agentm commented 6 years ago

This is now merged. Thanks for your hard work on this feature- this is an unusual feature that other DBMSes don't have.

YuMingLiao commented 6 years ago

Glad to finish my first open source contribution here. Thanks for the support, mentoring and encouragement.