The most recent revision attempts to make variable relationships clearer and obvious from the syntax. A nice consequence of this revision is that the conceptual differences between Tisane and existing software tools are more apparent.
Variables
An end-user expresses variables according to their data type. If the end-user later provides the data, the variable names should be the column names. For nominal or ordinal data, end-users also must specify the cardinality of variables if they do not intend to provide data. If end-users provide data, cardinality information is not required. In this case, Tisane will calculate and populate these fields internally.
Variables are observed values of a measure. Variables can be measures of interest, as in dependent and independent variables. Variables can also be id numbers that act as keys to a dataframe (e.g., participant id).
import tisane as ts
# Example 1:
hw = ts.Numeric('Homework') # 'homework' is the column name
race = ts.Nominal('Race', cardinality=5) # there are 5 groups/options for the variable race
math = ts.Numeric('MathAchievement')
mean_ses = ts.Numeric('Mean_SES')
student = ts.Nominal('student id', cardinality=100) # IDs 100 students included in this study
school = ts.Nominal('school', cardinality=10) # IDs for schools, 10 students/school
# Example 2:
leaf_length = ts.Numeric('length')
fertilizer = ts.Nominal('fertilizer condition', cardinality=2)
season = ts.Nominal('season', cardinality=4)
plant = ts.Nominal('plant id')
bed = ts.Nominal('plant bed')
An end-user expresses relationships between variables that are related to domain theory (conceptual models) and data measurements.
Conceptual Relationships
There are two types of conceptual relationships: cause and associates_with
# Example 1
hw.cause(math) # Hours spent on homework causes math achievement.
race.associates_with(math) # Math scores and race are associated with each other.
# Example 2
fertilizer.cause(leaf_length) # Fertilizer causes leaf growth
Definitions:
cause: The LHS variable causes the RHS variable. The RHS variable cannot also cause the LHS variable.
associates_with: The LHS and RHS variables are associated/related in some way that is not causal.
Tisane provides aliases to both: causes and cause and associate_with and associates_with
Data measurement relationships
There are three types of data measurement relationships: (1) measurement attribution, (2) treatment for experiments, and (3) data hierarchies.
Measurement attribution
# Example 1:
student.has(hw)
student.has(race)
student.has(math)
school.has(mean_ses)
# Example 2:
plant.has(leaf_length)
Definition:
has distinguishes "levels" of observations by attributing variables to each level. In Example 1, there are two levels: student and school. Each student has a value for homework, race, and match. Each school has a value for mean_ses.
Idea: Create a separate Data type for "ID" and enforce that only variables of type "ID" can have other variables.
Treatment
End-users can express experimental treatments/manipulations.
# Example 2:
fertilizer.treats(bed)
Only Example 2 is an experiment. Each bed is treated with a fertilizer. In other words, fertilizer is a bed-level manipulation.
Definition:
treats expresses the explicit/intentional manipulation of variables in an experiment. X.treats(Y) is internally equivalent to Y.has(X), which means that each Y has an observation for X.
Idea: Check that the LHS variable of treats has a causal relationship (in the graph) with the DV? And keeptreatsandhas` different from one another.
Data hierarchies
Data can be clustered or nested. Tisane provides support for expressing two possible sources of clustering: (1) repeated measures and (2) nested relationships.
# Example 1
student.nest_under(school) # Students belong to a school. Students within a school might also cluster more than between schools.
# Example 2
plant.nest_under(bed) # Plants belong in plant beds.
plant.repeats(measure=leaf_length, repetitions=season) # Repeatedly measure the same plant once per season
Definitions:
nest_under nests one variable under another.
repeats means the LHS variable provides multiple values of the measure. Each value is enumerated/indexed by the repetitions variable (e.g., season). If a plant provides multiple measures per season, another column for indexing each measure is required.
The most recent revision attempts to make variable relationships clearer and obvious from the syntax. A nice consequence of this revision is that the conceptual differences between Tisane and existing software tools are more apparent.
Variables
An end-user expresses variables according to their data type. If the end-user later provides the data, the variable names should be the column names. For nominal or ordinal data, end-users also must specify the cardinality of variables if they do not intend to provide data. If end-users provide data, cardinality information is not required. In this case, Tisane will calculate and populate these fields internally.
Variables are observed values of a measure. Variables can be measures of interest, as in dependent and independent variables. Variables can also be id numbers that act as keys to a dataframe (e.g., participant id).
An end-user expresses relationships between variables that are related to domain theory (conceptual models) and data measurements.
Conceptual Relationships
There are two types of conceptual relationships:
cause
andassociates_with
Definitions:
cause
: The LHS variable causes the RHS variable. The RHS variable cannot also cause the LHS variable.associates_with
: The LHS and RHS variables are associated/related in some way that is not causal.Tisane provides aliases to both:
causes
andcause
andassociate_with
andassociates_with
Data measurement relationships
There are three types of data measurement relationships: (1) measurement attribution, (2) treatment for experiments, and (3) data hierarchies.
Measurement attribution
Definition:
has
distinguishes "levels" of observations by attributing variables to each level. In Example 1, there are two levels: student and school. Each student has a value for homework, race, and match. Each school has a value for mean_ses.Idea: Create a separate Data type for "ID" and enforce that only variables of type "ID" can
have
other variables.Treatment
End-users can express experimental treatments/manipulations.
Only Example 2 is an experiment. Each bed is treated with a fertilizer. In other words, fertilizer is a bed-level manipulation.
Definition:
treats
expresses the explicit/intentional manipulation of variables in an experiment. X.treats(Y) is internally equivalent to Y.has(X), which means that each Y has an observation for X.Idea: Check that the LHS variable of
treats has a causal relationship (in the graph) with the DV? And keep
treatsand
has` different from one another.Data hierarchies
Data can be clustered or nested. Tisane provides support for expressing two possible sources of clustering: (1) repeated measures and (2) nested relationships.
Definitions:
nest_under
nests one variable under another.repeats
means the LHS variable provides multiple values of the measure. Each value is enumerated/indexed by the repetitions variable (e.g., season). If a plant provides multiple measures per season, another column for indexing each measure is required.