ds4se / chapters

Perspectives on Data Science for Software Engineering
60 stars 34 forks source link

./andreas-zeller/appmining.md #23

Open timm opened 8 years ago

timm commented 8 years ago

After review, relabel to 'reviewTwo'. After second review, relabel to 'EditorsComment'.

juergenlive commented 8 years ago

Title of chapter

Mining Apps for Anomalities

URL to the chapter

https://github.com/ds4se/chapters/blob/master/andreas-zeller/appmining.md

Message?

App mining helps to detect if an app will show normal or abnormal behavior.

Accessible?

Is the chapters written for a generalist audience (no excessive use of technical terminology) with a minimum of diagrams and references? How can it be made more accessible to generalist?

The article is accessible to a more general audience.

Size?

Is the chapter the right length? Yes Should anything missing be added? If there is evidence that the approach is seen already as valuable by external stakeholders (e.g., app stores are using it) I would include this in the article. Tracking user behavior is not always bad. It is also a powerful mechanism to provide competitive apps.

Can anything superfluous be removed (e.g. by deleting some section that does not work so well or by using less jargon, less formulae, lees diagrams, less references).? What are the aspects of the chapter that authors SHOULD change?

The first section does not work so well as it mixes several perspectives (i.e., user, developer, tester, app store provider?). It might be better to describe the intro from only one perspective (e.g., a malware scenario) and later mention that the approach could be used in many ways.

typo: Is the features id clearly ...

I was a bit confused that an app is not seen as a product. In software management as well as product management apps are considered as products.

Gotta Mantra?

We encouraged (but did not require) the chapter title to be a mantra or something cute/catchy, i.e., some slogan reflecting best practice for data science for SE? If you have suggestion for a better title, please put them here.

How to find the black sheeps in app stores?

Best Points

What are the best points of the chapter that the authors should NOT change?

The last two sections.

bramadams commented 8 years ago

Title of chapter

Mining Apps for Anomalies

URL to the chapter

https://github.com/ds4se/chapters/blob/master/andreas-zeller/appmining.md

Message?

The chapter explains the need for and basic concepts of mobile app store mining, followed by an application to the identification of malicious app behaviour.

Accessible?

The chapter is very accessible, providing easy to understand examples and avoiding too technical terms. It builds up nicely the storyline from a concrete question, to the concept of app store mining, and culminating in the discussion of CHABADA. As such, there aren't many things that can be improved. Below, I provide a number of nitpicking comments about the chapter contents, followed by some textual comments.

At times, the "mining" discussed by the chapter seems more general than mobile apps, basically considering the mining of any kind of repository. Especially the word "program" instead of "mobile app" in the very first sentence creates this anticipation. Hence, either the chapter should use "mobile app" throughout, or it could also be interesting to mention somewhere the broader field of MSR or software analytics.

The metaphor "That's because we humans can build on our experience with similar games" does not fully work out. It is true that software/hardware do not have such a thing, which is why eventually mining is required to derive equivalent insights. However, the majority of the chapter does not talk in terms of experience, but rather in terms of expectations and how to derive these based on what the majority of other apps and their developers encountered (wisdom of crowd). A quick fix would be to say that humans base themselves on their expectations, which in turn are based on experience or any other kind of data source.

The chapter asks the question "Does the program do what it is supposed to do?", which suggests talking about the functional correctness of an app. However, most of the chapter focuses on how malicious an app behaves, which is not the same thing. Maybe the opening question should be made more focused and immediately talk about malicious apps?

It is correct that there is a nice number of open source mobile apps available, but the most popular apps are not amongst those. Similarly, apart from version control repositories, bug/review repositories either do not exist or are not accessible. Furthermore, obtaining the actual mobile app data is also still a chore (requiring special-purpose crawlers, ...), even though (in theory) "they offer so many different data sources that can all be associated with each other". Although this chapter is meant towards a more general public (and some of the app data issues might be fixed if app stores would collaborate), it might be a good idea to somehow mention the existing data challenges, since they are an obstacle.

The chapter says that in regular open source development "each solution would typically be implemented exactly once, and then reused". In theory yes, but the fact that there are so many Linux distributions, browsers, word processors, etc. seems to suggest otherwise. It might be true that mobile apps feature many more alternatives, given the low threshold to enter the market and typically small app size, but I am not aware of any such study.

Finally, the chapter mentions as a side note "(You can also mine and associate just the metadata, finding that bad reviews correlate with low downloadnumbers - but than, this would be product mining, not app mining)". I see your point, however there currently is no consensus in the community about this. I.e., is mining of app stores only app store mining if it involves the app store data with at least one other repository, or does the mining of a store containing mobile app data qualify as app store mining? Personally, I would leave out this sentence to simplify things.

Size?

The size of the chapter seems perfect. Apart from the nitpicks above, not much can be improved, as every section plays its role.

Gotta Mantra?

The title is not really a mantra, but is straight-to-the-point (which is hard to improve). Maybe a play on the term "crowd-sourcing" could work, for example "App-sourcing: Mining Apps for Anomalies".

Best Points

I liked the fluent storyline, concrete examples and simplified terminology used. These allow newcomers to grasp the major concepts and see a concrete application of them.

Weakest Points

Just a number of nitpicks, as outlined above.

Textual Comments

lauriew commented 8 years ago

@andreas-zeller Your two reviews are in. As was stated in an earlier email, we are looking for new versions of the chapters by January 13.

I made the following notes as I read in addition to those from the other reviewers:

Put references in the text for the references at the end of the chapter.