VISSOFT Reviews - Githubissues

----------------------- REVIEW 1 --------------------- PAPER: 15 TITLE: A Tool for Visualizing Patterns of Spreadsheet Function Combinations AUTHORS: Justin Middleton and Emerson Murphy-Hill

----------- Review ----------- Spreadsheets and their embedded logic are used in nearly any business and work environment. Therefore, the goal of this paper, understanding this logic, is very important.

I believe that the proposed tool makes it easier to explore the logic embedded in spreadsheets. However, my major qualms with this paper are that the proposed visualization could be improved and that more analysis might have been added to make it easier to understand complex structures.

The goal of (software) visualization is to represent possibly complex abstract concept by a compact visual representation. Making these visual representations compact, yet intuitive, is a requirement given the sheer amount of information they represent. The explicit representation of the index of the arguments (1, 2, 3, ...) in the proposed visualization doesn't seem to add much value. I don't understand why the authors didn't act on the hint of one of their users in the test study who was also confused by these numbers [#144]. These number should have been omitted. In addition, drawing all 27 arguments as in Fig. 10 illustrates that this approach is not scalable (although it illustrated the pathology in the source) [#145].

If all this visualization offers to the user is to explore a call tree, wouldn't a simple, textual version of a tree with indentation suffice [#146]?

I believe there is a high potential for smarter analysis in the metadata that the authors collect from spreadsheets with their Java. Agreed, the authors didn't want to make a judgement about the quality of the logic, but one of the case studies was to detect 'bad smell'. A lot of these conclusions could have been completely automated: why not flag cells in a spreadsheet with long parameter lists, conditional complexities or too many operations [#147]?

It seems to me all the research that has been done on call trees for single program executions (over probably 20 years) can be applied to this area of spreadsheet logic. This applies not only to the analysis but also to visualizing these call structures [#146].

----------------------- REVIEW 2 --------------------- PAPER: 15 TITLE: A Tool for Visualizing Patterns of Spreadsheet Function Combinations AUTHORS: Justin Middleton and Emerson Murphy-Hill

----------- Review ----------- The paper presents a tree visualization to analyze usage patterns in formulae used in spreadsheet corpora. The tree shows for every parameter of a function its most used subfunctions or arguments. The tree can be explored interactively by expanding and collapsing nodes. In a case study, the authors demonstrate how bad smells and user patterns of specific functions can be explored. A qualitative user study with four participants tests the approach in practice.

Spreadsheets are a popular way of programming by transforming data through formulae. Visualizing these formulae hence is a relevant research question for VISSOFT. In general, the paper is well-written and easy to follow. The visualizations illustrate well the document and provide meaningful insights. I appreciate that the authors provide a video and published the source code on GitHub.

I see the main contribution of the paper in suggesting a new way to investigating spreadsheet corpora. However, it seems that the suggested scenarios and tasks are only partly relevant in practice. Also, the visualization lacks good overview and visual guidance to really investigate the data [#149]. Although the authors show some results of the approach in a case and user study, the paper could not fully convince me that the approach is applicable as intended. This is why I do not clearly recommend acceptance.

The approach is motivated to address two tasks: the exploration of bad smells and the use in educational scenarios. However, I find those two scenarios not very convincing. How could users judge if something is a bad smell if they are not expert in spreadsheets? Where to look in the huge tree for bad smells? Wouldn’t novice user of spreadsheets be overwhelmed if they see all possible uses of a formula (instead of a few selected good ones)? How to discern “good” uses from “bad” uses [#148]? I believe the tool (and the dataset) is much more valuable for spreadsheet researchers and experts who’d like to study usage patterns. Hence, it might be that the user study targeted the wrong tasks and users. The case study, in contrast, shows that the tool could help to gain interesting insights. As an expert analysis tool, I imagine extending the tool towards comparison of different sets of spreadsheets (e.g., faulty vs. non-faulty) would significantly increase the number of meaningful analyses one could perform with the tool [#149]

The visual design of the approach is clean and simple. The visualization technique is an adaption of a standard tree visualization, hence, not novel as an encoding. But the adaptations and the mapping to the investigated data has been done with care, for instance, fading out lines before they overlap with the labels. While the visualization is easy to understand and provides certain insights, I somewhat miss a better overview and data richness: the user has to click a lot to discover meaningful insights and there are only weak visual pointers that indicate where to look for interesting data [#150]. For instance, the parameter nodes (cf. Fig. 10) do not show any additional information besides the number of parameters.

Minor comments:

In my impression, the text is in parts too colloquial, for instance, I suggest to not use shortened forms like “it’s” or exclamation marks or phrases like “a word on”. [#151]
Some figures have poor resolution [#152]
Sometimes, closing quotation marks are used as opening ones [#156]
Avoid single lines at the beginning/ending of a page
The introduction talks in detail why spreadsheets are an interesting area, but I feel that is not necessary (anymore). [#153]
Introduction: “in Hermans’ [9] and Jansen’s [10]” - both papers have more than one author
Section II: “ofter” -> “often”
Section III.A: I do not understand why the goals are numbered “1”, “!1”, etc.
Fig 2-4 should be merged (or Fig 2-3 can be removed because they are contained already in Fig 4) [#154]
Section IV.B: “Hermans, Aivaloglou, and Jansen” - reference missing, three authors -> “et al.”?
Section V: Why is information on participants only provided in subsection B not in A already? [#155]
Section V: What should “not extremely familiar” mean? Anybody who is not an expert?

----------------------- REVIEW 3 --------------------- PAPER: 15 TITLE: A Tool for Visualizing Patterns of Spreadsheet Function Combinations AUTHORS: Justin Middleton and Emerson Murphy-Hill

----------- Review ----------- The authors present a tool that analyzes a set of Excel spreadsheets and visualizes information about the functions that actually occur in these spreadsheets. More precisely, the visualization shows the kinds and frequency of functions and their arguments in form of interactive trees. The authors illustrate their approach in a case study and also present some findings from an initial, "exploratory" user study (with 4 participants).

The weakest part of the paper is the user study. The authors do not provide any information about the participants (students, professionals?). Only in Section B they mention that none of the "users" considered themselves to be experts on Excel [#155]. If they are non experts, I think that their statements about the possible uses, in particular, smell tracking are not of much help, because they probably lack experience of what a smell in Excel is [#156]. The authors spend more than a page discussing limitations of the user study as well as of the overall approach. While discussing limitations is an important part of empirical research papers, I would have preferred to see more examples of insights about the Enron dataset gained with the tool [#158].

Overall, I think that this is a novel and useful approach. Nevertheless, I think that the paper is very wordy and could easily be reduced to a short paper without much loss of information.

Some typos: Section II:

into complex formula --> into complex formulae [#156]
ofter --> often Section III:
will to foster --> is meant to foster
times to root --> times the root
some use as ABS --> some uses as ABS (or even better rephrase the sentence) Section IV:
we can be gained --> what can be gained
the one can --> one can
lest it return --> lest it returns REFERENCES
in some of the references: euses --> EUSES
in [25]: java --> Java
in [25]: eclipse ide --> Eclipse IDE
in [26]: api --> API

DeveloperLiberationFront / Excel-Function-Visualizer

VISSOFT Reviews #143