Closed rolyp closed 11 months ago
Re array reshaping, we could look at a workflow for data cleaning and preprocessing. For example:
Re image processing, an example of a basic (but realistic) concrete workflow could be to:
There are a lot more possible steps you could compose onto this. I think image processing tells the story of composition quite well, and that could be motivating enough to be different from the POPL paper.
@min-nguyen These are great. Let’s start thinking about questions you might find yourself (as a programmer) asking in these application domains that could be answered by backwards/forwards slicing or linked inputs/outputs.
Observation: linked inputs ($\triangleright^{\circ} \circ \triangleleft$) and linked outputs ($\triangleleft \circ \triangleright^{\circ}$) reveal different information depending on how much of the pipeline you’re running the analysis over. For example, suppose the pipeline has two steps $\mathsf{parse} \circ \mathsf{lex}$. Then linked inputs over just the $\mathsf{lex}$ step will reveal (for a given input character) what other characters needed to be inspected in order to generate the containing token. But linked inputs over both steps $\mathsf{parse} \circ \mathsf{lex}$ will pull in all the characters that were inspected in order generate the containing syntax node. (I’m probably over simplifying but something like that should be true.)
So it might be worth thinking about how these analysis could help someone understand/debug individual steps (or small sequences of steps) in pipelines such as the ones above.
I wonder if we can fit Bézier curves into the edge detection example (as a subsequent vectorisation step). That doesn’t sound easy but maybe there are standard techniques. I guess what I’m imagining is a transformation step that interprets the image data as something more structured/domain-specific, so we can show the analysis working bidirectionally across that.
Added stochastic matrices and PCA (Principle Component Analysis) to candidate examples above.
Added Bayesian Model Averaging (climate science example from Dominic).
Dropping the scale invariant metric for now, too complicated an example. I am currently working on finding an appropriately simple example that involves combining data at multiple resolutions, preferably with locality. I think the basic statistical task of Bezier curve fitting might be inappropriate, as there is still a notion of the outputted curves depending on their entire inputs, in order to best fit the data overall. Unsure how we can reconcile this with our current notions of dependency, as the global dependency structures induced by a lot of statistical tasks are proving to be an issue.
Some of the multi-scale models I've found literature on so far seem quite complex. I think there may be a benefit to considering them, modulo some concerns regarding the models themselves. I am currently investigating if it will be possible to consider the model-mixing algorithms, whilst treating the data we use to combine them as static inputs for now. This would potentially obviate the need for potentially complicated probabilistic computation. Bayesian model average approaches still induce some sort of global dependency structure so I think the multi-scale approach might be for the best if we can overcome some of the challenges I've already mentioned. Needs a fair amount more investigation though
Added some pointers to time series distance metrics. Symbolic Aggregate Approximation (turns a time series into a string by quantizing) might be worth looking at as the string could then be an input to a later processing stage.
Current working example is #765 which we can implement simply, and then use as part of a larger pipeline as mentioned above
Dropping back to Paused while we work on #765.
I think we have our example (which we’ll gradually flesh out with real data and other scenarios), so closing this.
We need some new non-trivial motivating examples for the paper.
Current candidates
765
Other thoughts
Time series distance metrics and relationed notions
Array reshaping and other tensor operations
Image processing
Other statistical/probabilistic analyses
Rejected examples
752 (too complex)