ds4se / chapters

Perspectives on Data Science for Software Engineering
60 stars 34 forks source link

./sback/summarizing-unstructured-data.md #74

Closed timm closed 8 years ago

timm commented 9 years ago

After review, relabel to 'reviewTwo'. After second review, relabel to 'EditorsComment'.

rrobbes commented 8 years ago

Title of chapter

Summarizing Unstructured Data

URL to the chapter

https://github.com/ds4se/chapters/blob/master/sback/summarizing-unstructured-data.md

Message?

What is the chapter's clear and approachable take away message?

Unstructured data (e.g. emails with a mix of natural language and code) is a useful source of information, although it is noisy. Fortunately, simple techniques allow us to classify elements in email and filter out noise, greatly easing summarisation of the data.

Accessible?

Is the chapters written for a generalist audience (no excessive use of technical terminology) with a minimum of diagrams and references? How can it be made more accessible to generalist?

The chapter is accessible and easily understandable. The are quite a few figures (6), but each of them fits a purpose.

Size?

Is the chapter the right length?

The chapter is in the acceptable limits.

Should anything missing be added?

Not really ... the chapter has a good level of granularity and nicely closes off solving the initial problem that was presented.

Can anything superfluous be removed (e.g. by deleting some section that does not work so well or by using less jargon, less formulae, lees diagrams, less references).?

No. There may be many figures, but they all feel necessary. Perhaps the first two views of the email could be condensed in one figure.

What are the aspects of the chapter that authors SHOULD change?

The title! My impression is that the chapter is not really about summarising unstructured data, but more on steps to apply prior to summarising software data. The "summary" is a tag cloud, whereas at first start I was expecting content more related to actual summarisation techniques.

Gotta Mantra?

We encouraged (but did not require) the chapter title to be a mantra or something cute/catchy, i.e., some slogan reflecting best practice for data science for SE? If you have suggestion for a better title, please put them here.

The title of the chapter is not a mantra. Perhaps (and that can certainly be improved upon!) something like:

"Structure your unstructured data"

Best Points

What are the best points of the chapter that the authors should NOT change?

The paper shows a single problem and nicely addresses it, making it a very easy read and rewarding read. I find it very well written.

Other comments:

timm commented 8 years ago

Review template

Before filling in this review, please read our Advice to Reviewers.

(If you have confidential comments about this chapter, please email them to one of the book editors.)

Title of chapter

Summarizing Unstructured Data

URL to the chapter

https://github.com/ds4se/chapters/blob/master/sback/summarizing-unstructured-data.md

Message?

What is the chapter's clear and approachable take away message?

mess has structure too... which can be automatically extracted

Accessible?

Is the chapters written for a generalist audience (no excessive use of technical terminology) with a minimum of diagrams and references? How can it be made more accessible to generalist?

approachable

Size?

size is good

Gotta Mantra?

currently its Summarizing Unstructured Data

how about "Embrace the Mess (methods for summarizing unstructured data"

Best Points

I like the progress from one tag cloud to another

Editorial quibbles

"lots and lots" => much

and just a suggestion (totally optional), i'd change your first 2 pars

para1 as is One person that never took part in a serious team development effort may easily think that what software engineers do all day is to stay behind a screen and read and write only one thing: Source code.

para1 to do: Anyone who has never worked on a real software project might mistakenly believe that software engineers spend all their time reading source code. Note that if this was so, then all we'd ever need to better support SE is better support for handling structured data.

But software engineers do much more than just read code. Their day-to-day reality is that they spend much time writing a wide range material-- little of which is source code (even in open-source software projects, where there is not really a paper-driven or manager-mandated development process, we clearly see this happening). Accordingly, it is very important to discuss methods for handling unstructured data.

tzimmermsr commented 8 years ago

Great chapter @sback ! Please take a look at the reviews and prepare a final version by January 13.

I second @rrobbes comment about the title. A simple fix could be to append a "with Tag Clouds", i.e., "Summarizing Unstructured Data with Tag Clouds" or to put the emphasis on structuring unstructured data. I leave the decision about whether to make changes to you.

If you decide to mention Stefan Wagner's chapter (https://github.com/ds4se/chapters/blob/master/wagnerst/text-mining.md ), please keep in mind that the title is not yet final.

sback commented 8 years ago

Hello @rrobbes, @timm, @tzimmermsr!

Thank you for the wonderful feedback! Really. I like this much more than receiving anonymous reviews (but that's for another forum to discuss ;) )

I just pushed the new version following your suggestions. This is the change log:

  1. Changed the title into: "Structure Your Unstructured Data First! \n The Case of Summarizing Unstructured Data with Tag Clouds" as a way to include suggestions from both @rrobbes and @timm.
  2. Changed the first two paragraphs, as suggested by @timm, with minor rephrasing of what he suggested
  3. Decided not to add reference to Stefan's chapter as I really don't see the connection. @rrobbes do you feel strongly about me adding it?
  4. About the data volume ending in 2010, I have no further data and it would require some time to get it. @rrobbes, if you think it'd add a lot of value, I can fetch it. Otherwise, I think the chapter stands on its own with this graph.
tzimmermsr commented 8 years ago

Thanks @sback . This looks good.

I wonder if you really need the subtitle, or if you can you can just go with "Structure Your Unstructured Data First!" The tag clouds are just an example to show the value of structured data.