schema_colimit observation_equations syntax

o1lo01ol1o commented 4 years ago

I had thought that I understood that the syntax of observation_equations in schema_colimit foo = quotient SchemaA SchemaB should use underscores to prefix each symbol:

entity_equations
  SchemaA.entityA = SchemaB.entityC
observation_equations
  forall x. x.SchemaA_entityA_attributeFoo = x.SchemaB_entityC_attributeBar

However, in a currently less trivial colimit I seem to have to define symbols at arbitrary levels of qualification otherwise I'm given "well-sorted" errors:

schema_colimit Colimit_2014_2019 = quotient SColimit20142017 + SColimit20182019  : tsql {
[...]
observation_equations
       forall x. x.toHolderName = x.topHolderName
       forall x. x.LAR_activity_year = x.LAR_activity_year
       forall x. x.SColimit20142017_LAR_state_code = x.SColimit20182019_LAR_state_code

[...]
Cannot infer a well-sorted term for SColimit20142017_LAR_state_code(x).
Undefined (or not java-parseable) symbol: SColimit20142017_LAR_state_code.

Available symbols:
    LAR_state_code

in the above above case, if I omit the schema qualification, CQL seems to be happy. Should I be doing something differently?

wisnesky commented 4 years ago

Schema colimit syntax is indeed a bit weird; it came late to CQL. At some point, dots turn into underscores, and at some places, ambiguity forces (over-eager) qualification whereas at other places it does not. My advice would be to just use whatever CQL seems happy with as it will ‘do the right thing'; the syntax itself is more ad-hoc than the rest of CQL.

There are options, some of which may be enabled by default, that simplify the names in colimits automatically by doing things like finding common prefixes. In practice, we find that much of the semantics of a schema merge problem is actually contained in naming the groups of attributes, and so people have even contributed ’naming strategies’ for the schema_colimit primitive that can be accessed as various options. Anyway, you may find setting simplify_names = false to be helpful to start. left_bias is another option to try.

On Jul 15, 2020, at 3:48 PM, Tim Pierson notifications@github.com wrote:

I had thought that I understood that the syntax of observation_equations in schema_colimit foo = quotient SchemaA SchemaB should use underscores to prefix each symbol:

entity_equations SchemaA.entityA = SchemaB.entityC observation_equations forall x. x.SchemaA_entityA_attributeFoo = x.SchemaB_entityC_attributeBar However, in a currently less trivial colimit I seem to have to define symbols at arbitrary levels of qualification otherwise I'm given "well-sorted" errors:

schema_colimit Colimit_2014_2019 = quotient SColimit20142017 + SColimit20182019 : tsql { [...] observation_equations forall x. x.toHolderName = x.topHolderName forall x. x.LAR_activity_year = x.LAR_activity_year forall x. x.SColimit20142017_LAR_state_code = x.SColimit20182019_LAR_state_code

[...] Cannot infer a well-sorted term for SColimit20142017_LAR_state_code(x). Undefined (or not java-parseable) symbol: SColimit20142017_LAR_state_code.

Available symbols: LAR_state_code in the above above case, if I omit the schema qualification, CQL seems to be happy. Should I be doing something differently?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CategoricalData/CQL/issues/49, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2QKN4YVYNBGWL4W3C2NP3R3YW3FANCNFSM4O3DFE6A.

o1lo01ol1o commented 4 years ago

I eventually got the hang of it, though in the process suffered a fair amount of headache with postgres's implicit case insensitivity (I used snake-case everywhere but couldn't be sure the extra underscores were't affecting something so refactored to camel case).

CategoricalData / CQL

schema_colimit observation_equations syntax #49