ds4se / chapters

Perspectives on Data Science for Software Engineering
59 stars 33 forks source link

./mikegodfrey/provenanceChapter.md #103

Closed margaretstorey closed 8 years ago

margaretstorey commented 8 years ago

Title of chapter

Why Provenance Matters

URL to the chapter

https://github.com/ds4se/chapters/blob/master/mikegodfrey/provenanceChapter.md

Message?

What is the chapter's clear and approachable take away message?

That not caring about provenance can lead to some serious issues, and that there are different techniques that can be used to determine provenance, but there are challenges in applying them.

Accessible?

Is the chapters written for a generalist audience (no excessive use of technical terminology) with a minimum of diagrams and references? How can it be made more accessible to generalist?

It is quite accessible, if the reader has some insights into issue trackers (which I assume most readers would). But adding one more sentence about them may help.

Size?

Is the chapter the right length? Should anything missing be added? Can anything superfluous be removed (e.g. by deleting some section that does not work so well or by using less jargon, less formulae, lees diagrams, less references).? What are the aspects of the chapter that authors SHOULD change?

It is within the guidelines. The title of the section called "What are the key tasks?" didn't flow well from the previous sections, tasks for what exactly? I think you need to explicitly say that determining provenance is an important task first (ok, it is obvious, but the writing would flow better). Perhaps link back more clearly to the three scenarios when you propose solutions for as well, again the flow was a bit difficult to follow in that regard (minor fix). Perhaps you could remind the reader of the different problems as you describe the solutions? Perhaps mention the Perils paper that discusses issues with relying on data from GitHub.

Gotta Mantra?

We encouraged (but did not require) the chapter title to be a mantra or something cute/catchy, i.e., some slogan reflecting best practice for data science for SE? If you have suggestion for a better title, please put them here.

The title is clear.

Best Points

What are the best points of the chapter that the authors should NOT change?

I really like how the compelling stories at the beginning, that hook in the reader. Your definition of provenance is also succinct. You build a strong case for why provenance is important.

tzimmermsr commented 8 years ago

Review template

Before filling in this review, please read our Advice to Reviewers. (If you have confidential comments about this chapter, please email them to one of the book editors.)

Title of chapter

Why Provenance Matters

URL to the chapter

https://github.com/ds4se/chapters/blob/master/mikegodfrey/provenanceChapter.md

Message?

What is the chapter's clear and approachable take away message?

Provenance is about the origin, history, and ownership of artifacts and provenance matters.

Accessible?

Is the chapters written for a generalist audience (no excessive use of technical terminology) with a minimum of diagrams and references? How can it be made more accessible to generalist?

The chapter is very accessible. The chapter will be relevant to professionals, students, and researchers. It makes a convincing case for the importance of provenance.

Size?

Is the chapter the right length? Should anything missing be added? Can anything superfluous be removed (e.g. by deleting some section that does not work so well or by using less jargon, less formulae, lees diagrams, less references).? What are the aspects of the chapter that authors SHOULD change?

The chapter is the right length. Nothing should be removed.

I feel that the structure of "What are the key tasks?" could be improved a little bit to make the key tasks more explicit. The most prominent structural element is the enumeration for the kinds of entities, which I initially mistook as the answer for the question. I believe that the key tasks are the emphasized (defining and scoping the entities, establishing artifact linkages and ground truth; possibly scalable matching algorithms and perform a historical analysis; for the last two I'm not sure). The key tasks could also just be what is mentioned in the first sentence "identify which entities we are interested in and how they relate to each other." It be great if the key tasks would be more recognizable and reflected by the structure.

I guess with questions as headings I would have expected the answer to come sooner (=within first paragraph).

The enumeration with the kinds of entities might fit better after the sentence about "Depending on the task at hand, defining and scoping the entities".

The intro talks about "problems" but later these are referred to as "scenarios". This is a bit confusing. Maybe reference the "first scenario" in addition by name to make clear what it refers to. In "Establishing artifact linkage and ground truth is the next problem" maybe replace problem with tasks.

I think it would be good to point out that provenance is not just important for software development but also for data analysis (since it's a book about Data Science). Maybe this can go into the Looking Ahead part. It doesn't have to be long, just one or two sentences that it does matter for data too. Maybe some pointer to further information, e.g., http://cacm.acm.org/blogs/blog-cacm/169199-data-science-workflow-overview-and-challenges/fulltext

Gotta Mantra?

We encouraged (but did not require) the chapter title to be a mantra or something cute/catchy, i.e., some slogan reflecting best practice for data science for SE? If you have suggestion for a better title, please put them here.

The title is perfect.

Best Points

What are the best points of the chapter that the authors should NOT change?

The introduction, the "What is Provenance", and the "Another Example" section are great. The chapter makes a convincing case for the importance of provenance.

timm commented 8 years ago
migod commented 8 years ago

I did a mild rewrite to incorporate pretty much all of the suggestions, including restructuring the "what are the key tasks" section and adding some new references. Sincere thanks to Peggy and Tom!

timm commented 8 years ago

:+1: @migod good to go