New examples - Githubissues

rolyp commented 1 year ago

We need some new non-trivial motivating examples for the paper.

Current candidates

765

Other thoughts

Time series distance metrics and relationed notions

How Can We Quantify Similarity Between Time Series?
Symbolic Aggregate Approximation (SAX) (using the quantile function), preprocessor step for e.g. motif discovery

Array reshaping and other tensor operations

would require new syntax & semantics for multidimensional arrays + FFI for array operations
could be interesting because polymorphic array operations would preserve dependency information relating to cells

Image processing

iterated convolution could be used to show “extensional” slicing (erasing intensional information)
bit similar to example in POPL paper?
see Nile Viewer; Bézier curve fitting also arises in data visualisation

Other statistical/probabilistic analyses

Principle Component Analysis (PCA) and Temporal PCA
Stochastic matrices (transition matrix of Markov chain)

Rejected examples

752 (too complex)
Lexing/parsing/regexes (would require non-trivial parser combinator library)

min-nguyen commented 1 year ago

Re array reshaping, we could look at a workflow for data cleaning and preprocessing. For example:

Loading a CSV file
Removing missing (NaN) and redundant information to create a filtered data set
Deciding on a target column: creating a matrix of dependent variables and a vector of independent variables.
Normalising/combining different columns to create more meaningful variables/features

Re image processing, an example of a basic (but realistic) concrete workflow could be to:

Optional: If we care about RGB images, applying a grayscale conversion.
Applying a simple image blurring technique to perform noise reduction e.g. with a Gaussian Filter
Applying a simple gradient calculation/edge detection algorithm e.g. with a Sobel Filter

There are a lot more possible steps you could compose onto this. I think image processing tells the story of composition quite well, and that could be motivating enough to be different from the POPL paper.

rolyp commented 1 year ago

@min-nguyen These are great. Let’s start thinking about questions you might find yourself (as a programmer) asking in these application domains that could be answered by backwards/forwards slicing or linked inputs/outputs.

Observation: linked inputs ($\triangleright^{\circ} \circ \triangleleft$) and linked outputs ($\triangleleft \circ \triangleright^{\circ}$) reveal different information depending on how much of the pipeline you’re running the analysis over. For example, suppose the pipeline has two steps $\mathsf{parse} \circ \mathsf{lex}$. Then linked inputs over just the $\mathsf{lex}$ step will reveal (for a given input character) what other characters needed to be inspected in order to generate the containing token. But linked inputs over both steps $\mathsf{parse} \circ \mathsf{lex}$ will pull in all the characters that were inspected in order generate the containing syntax node. (I’m probably over simplifying but something like that should be true.)

So it might be worth thinking about how these analysis could help someone understand/debug individual steps (or small sequences of steps) in pipelines such as the ones above.

rolyp commented 1 year ago

I wonder if we can fit Bézier curves into the edge detection example (as a subsequent vectorisation step). That doesn’t sound easy but maybe there are standard techniques. I guess what I’m imagining is a transformation step that interprets the image data as something more structured/domain-specific, so we can show the analysis working bidirectionally across that.

rolyp commented 1 year ago

Added stochastic matrices and PCA (Principle Component Analysis) to candidate examples above.

rolyp commented 1 year ago

Added Bayesian Model Averaging (climate science example from Dominic).

JosephBond commented 12 months ago

Dropping the scale invariant metric for now, too complicated an example. I am currently working on finding an appropriately simple example that involves combining data at multiple resolutions, preferably with locality. I think the basic statistical task of Bezier curve fitting might be inappropriate, as there is still a notion of the outputted curves depending on their entire inputs, in order to best fit the data overall. Unsure how we can reconcile this with our current notions of dependency, as the global dependency structures induced by a lot of statistical tasks are proving to be an issue.

JosephBond commented 12 months ago

Some of the multi-scale models I've found literature on so far seem quite complex. I think there may be a benefit to considering them, modulo some concerns regarding the models themselves. I am currently investigating if it will be possible to consider the model-mixing algorithms, whilst treating the data we use to combine them as static inputs for now. This would potentially obviate the need for potentially complicated probabilistic computation. Bayesian model average approaches still induce some sort of global dependency structure so I think the multi-scale approach might be for the best if we can overcome some of the challenges I've already mentioned. Needs a fair amount more investigation though

rolyp commented 12 months ago

Added some pointers to time series distance metrics. Symbolic Aggregate Approximation (turns a time series into a string by quantizing) might be worth looking at as the string could then be an input to a later processing stage.

JosephBond commented 12 months ago

Current working example is #765 which we can implement simply, and then use as part of a larger pipeline as mentioned above

rolyp commented 12 months ago

Dropping back to Paused while we work on #765.

rolyp commented 11 months ago

I think we have our example (which we’ll gradually flesh out with real data and other scenarios), so closing this.

explorable-viz / fluid

New examples #733

Current candidates

765

Other thoughts

Time series distance metrics and relationed notions

Array reshaping and other tensor operations

Image processing

Other statistical/probabilistic analyses

Rejected examples

752 (too complex)