Derek-Jones / ESEUR-book

Issue handling for Evidence-based Software Engineering: based on the publicly available data
http://www.knosof.co.uk/ESEUR/
278 stars 18 forks source link

ata and Diagrams #1

Closed IMTorg closed 7 years ago

IMTorg commented 7 years ago

Derek,

I'm a big fan of your work and I'm really excited about your recent release! Previously, you stated that 'Companies just don’t want to reveal how much they spent/charged to writing a software system'. However, through previous consulting engagements, I feel the majority of firms simply don't have any tracking system for this type of data.

I recently released (beta) a SAAS API to quickly create structured data from source / version-control for your own projects and others: www.scrumsaga.com

While most guides are in Python, I'm trying to produce more R reports: https://github.com/IMTorgDemo/Reports/blob/master/guideHelloWorld-R.ipynb

I'd really like to get referenced in your book, and I realize you need high-quality data.

The service is only configured for GIthub repos, at the moment, I could provide you data from open-source contracted / paid projects, such as these US government projects, which I'm trying to collect: https://github.com/IMTorgRsrchProj

While I've collected a large amount of closed-source projects, from consulting engagements and organizations such as ISBSG (http://isbsg.org/), I am not in a position to share, legally. However, I could make an analysis of open-source and closed-source projects and provide you with results (just let me know what you need). In my experience, open- and closed-source projects do not differ as much as people would assume.

If any of this interests you, or I can help in any way, then please let me know! Cheers, Jason Beach

Derek-Jones commented 7 years ago

Jason,

You are right that most companies don't collect data. Even when it is collected it is rarely kept for very long. Obtaining data is often a matter of being in the right place at the right time.

There have been a number of projects/systems for extracting data from Github repos. These days we have lots of basic data on source code. It's the human data that is in short supply.

Thanks for the IMTorgRsrchProj link, I will keep my eye on this project.

Some links for you: DACS data is public, but DCARC currently has restricted access.

I suspect that NASA has a lot of data that nobody has ever asked them for a copy (based on papers I find every now and again).

ISBSG make money by charging access to their data. Income is always useful, no mater how small, so I suspect they will not release any large amount of data (they did release two files to the PROMISE.

A few researchers in business schools regularly find and analyse very high quality data (unfortunately it is invariably confidential); Chris Kemerer has some very interesting papers. The data can be found if you know who to ask.

Ask your contacts if they have any data and would they be willing to share it. You might be surprised how many consider releasing anonymized data.

IMTorg commented 7 years ago

Derek,

I will through some different records, and see what I can find that might meet the criteria of open data.

Jason

From: Derek M. Jones [mailto:notifications@github.com] Sent: Wednesday, March 8, 2017 9:19 PM To: Derek-Jones/ESEUR-book ESEUR-book@noreply.github.com Cc: Information Management Technologies information@mgmt-tech.org; Author author@noreply.github.com Subject: Re: [Derek-Jones/ESEUR-book] Data and Diagrams (#1)

Jason,

You are right that most companies don't collect data. Even when it is collected it is rarely kept for very long. Obtaining data is often a matter of being in the right place at the right time.

There have been a number of projects/systems for extracting data from Github repos. These days we have lots of basic data on source code. It's the human data that is in short supply.

Thanks for the IMTorgRsrchProj link, I will keep my eye on this project.

Some links for you: DACS data http://shape-of-code.coding-guidelines.com/2017/02/19/dacs-software-life-cycle-empiricalexperience-database/ is public, but DCARC http://shape-of-code.coding-guidelines.com/2013/01/22/us-dod-software-development-data-now-available/ currently has restricted access.

I suspect that NASA has a lot of data that nobody has ever asked them for a copy (based on papers I find every now and again).

ISBSG make money by charging access to their data. Income is always useful, no mater how small, so I suspect they will not release any large amount of data (they did release two files to the PROMISE http://openscience.us/repo/ .

A few researchers in business schools regularly find and analyse very high quality data (unfortunately it is invariably confidential); Chris Kemerer http://www.business.pitt.edu/katz/faculty/kemerer.php has some very interesting papers. The data can be found if you know who to ask.

Ask your contacts if they have any data and would they be willing to share it. You might be surprised how many consider releasing anonymized data.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Derek-Jones/ESEUR-book/issues/1#issuecomment-285234383 , or mute the thread https://github.com/notifications/unsubscribe-auth/APOBmIAqGoSyo34GWlBvgKr9c94EUcUoks5rj2GFgaJpZM4MXKOe . https://github.com/notifications/beacon/APOBmEhGDdJUj0PmV6s2qwBFeOCISd0Nks5rj2GFgaJpZM4MXKOe.gif

Derek-Jones commented 7 years ago

Jason,

Some interesting data exists in pdf files. This can be extracted, but requires a lot of effort.

If you know anybody who is after an interesting data extraction project, perhaps you could suggest a pdf graph to csv tool.

IMTorg commented 7 years ago

Derek,

I believe I can provide you with econ data on the following:

I will continue looking. How does this work for you?

Also, I’ve come across the graph-to-data extraction problem many times. No single tool appears to work in every situation. But, sorry, I don’t know of anyone who is interested in this.

Jason

From: Derek M. Jones [mailto:notifications@github.com] Sent: Friday, March 10, 2017 8:16 AM To: Derek-Jones/ESEUR-book ESEUR-book@noreply.github.com Cc: Information Management Technologies information@mgmt-tech.org; Author author@noreply.github.com Subject: Re: [Derek-Jones/ESEUR-book] Data and Diagrams (#1)

Jason,

Some interesting data exists in pdf files. This can be extracted, but requires a lot of effort http://shape-of-code.coding-guidelines.com/2013/12/19/converting-graphs-in-pdf-files-to-csv-format/ .

If you know anybody who is after an interesting data extraction project, perhaps you could suggest a pdf graph to csv tool.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Derek-Jones/ESEUR-book/issues/1#issuecomment-285666415 , or mute the thread https://github.com/notifications/unsubscribe-auth/APOBmNyVCFy4yfpRCh_XKyvHLNXEhJjhks5rkU0qgaJpZM4MXKOe . https://github.com/notifications/beacon/APOBmMjZTxGPWkt-9hO48V-wMeHjp44Wks5rkU0qgaJpZM4MXKOe.gif

Derek-Jones commented 7 years ago

Jason,

Switching to private email, rather than the Github issues list.

I believe I can provide you with econ data on the following:

  • (17) Dept of Treasury basic .Net Web Applications

  • (~20) anonymized large-firm projects

That would be great. Do you want to create a brief write-up on a web page that I can cite as the source?

Also, I’ve come across the graph-to-data extraction problem many times. No single tool appears to work in every situation. But, sorry, I don’t know of anyone who is interested in this.

The problem is that different packages draw graphs in different ways. For instance, some draw a plus character, others draw a vertical line then a horizontal line, while others draw the horizontal line first, while others draw all the horizontal lines before starting on the vertical lines.

Jason

From: Derek M. Jones [mailto:notifications@github.com] Sent: Friday, March 10, 2017 8:16 AM To: Derek-Jones/ESEUR-book ESEUR-book@noreply.github.com Cc: Information Management Technologies information@mgmt-tech.org; Author author@noreply.github.com Subject: Re: [Derek-Jones/ESEUR-book] Data and Diagrams (#1)

Jason,

Some interesting data exists in pdf files. This can be extracted, but requires a lot of effort http://shape-of-code.coding-guidelines.com/2013/12/19/converting-graphs-in-pdf-files-to-csv-format/ .

If you know anybody who is after an interesting data extraction project, perhaps you could suggest a pdf graph to csv tool.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Derek-Jones/ESEUR-book/issues/1#issuecomment-285666415 , or mute the thread https://github.com/notifications/unsubscribe-auth/APOBmNyVCFy4yfpRCh_XKyvHLNXEhJjhks5rkU0qgaJpZM4MXKOe . https://github.com/notifications/beacon/APOBmMjZTxGPWkt-9hO48V-wMeHjp44Wks5rkU0qgaJpZM4MXKOe.gif

-- Derek M. Jones Software analysis tel: +44 (0)1252 520667 blog:shape-of-code.coding-guidelines.com

IMTorg commented 7 years ago

Derek,

This Github Organization contains data from several different sources.

https://github.com/IMTorgRsrchProjData https://github.com/IMTorgRsrchProjData

Probably the most relevant are:

I ran into some difficulties getting the SampleGovernmentData (mostly Treasury). I am looking for alternative opportunities, now.

Jason

From: Derek M. Jones [mailto:notifications@github.com] Sent: Tuesday, March 14, 2017 9:47 AM To: Derek-Jones/ESEUR-book ESEUR-book@noreply.github.com Cc: Information Management Technologies information@mgmt-tech.org; Author author@noreply.github.com Subject: Re: [Derek-Jones/ESEUR-book] Data and Diagrams (#1)

Jason,

Switching to private email, rather than the Github issues list.

I believe I can provide you with econ data on the following:

  • (17) Dept of Treasury basic .Net Web Applications

  • (~20) anonymized large-firm projects

That would be great. Do you want to create a brief write-up on a web page that I can cite as the source?

Also, I’ve come across the graph-to-data extraction problem many times. No single tool appears to work in every situation. But, sorry, I don’t know of anyone who is interested in this.

The problem is that different packages draw graphs in different ways. For instance, some draw a plus character, others draw a vertical line then a horizontal line, while others draw the horizontal line first, while others draw all the horizontal lines before starting on the vertical lines.

Jason

From: Derek M. Jones [mailto:notifications@github.com] Sent: Friday, March 10, 2017 8:16 AM To: Derek-Jones/ESEUR-book <ESEUR-book@noreply.github.com mailto:ESEUR-book@noreply.github.com > Cc: Information Management Technologies <information@mgmt-tech.org mailto:information@mgmt-tech.org >; Author <author@noreply.github.com mailto:author@noreply.github.com > Subject: Re: [Derek-Jones/ESEUR-book] Data and Diagrams (#1)

Jason,

Some interesting data exists in pdf files. This can be extracted, but requires a lot of effort http://shape-of-code.coding-guidelines.com/2013/12/19/converting-graphs-in-pdf-files-to-csv-format/ .

If you know anybody who is after an interesting data extraction project, perhaps you could suggest a pdf graph to csv tool.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Derek-Jones/ESEUR-book/issues/1#issuecomment-285666415 , or mute the thread https://github.com/notifications/unsubscribe-auth/APOBmNyVCFy4yfpRCh_XKyvHLNXEhJjhks5rkU0qgaJpZM4MXKOe . https://github.com/notifications/beacon/APOBmMjZTxGPWkt-9hO48V-wMeHjp44Wks5rkU0qgaJpZM4MXKOe.gif

-- Derek M. Jones Software analysis tel: +44 (0)1252 520667 blog:shape-of-code.coding-guidelines.com

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Derek-Jones/ESEUR-book/issues/1#issuecomment-286425662 , or mute the thread https://github.com/notifications/unsubscribe-auth/APOBmCErml8hl9iHGyxW_hWMBMcrWJSYks5rlpprgaJpZM4MXKOe . https://github.com/notifications/beacon/APOBmAQ3UWfcdLiu4gEQ0l1ZlvWYSXq2ks5rlpprgaJpZM4MXKOe.gif

Derek-Jones commented 7 years ago

Jason,

I see you have been busy collecting data.

There is also the effort data in the Promise repo.

IMTorg commented 7 years ago

Great resource – thanks for sharing!

Jason

From: Derek M. Jones [mailto:notifications@github.com] Sent: Monday, March 20, 2017 12:16 PM To: Derek-Jones/ESEUR-book ESEUR-book@noreply.github.com Cc: Information Management Technologies information@mgmt-tech.org; Author author@noreply.github.com Subject: Re: [Derek-Jones/ESEUR-book] Data and Diagrams (#1)

Jason,

I see you have been busy collecting data.

There is also the effort data in the Promise repo http://openscience.us/repo/effort/ .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Derek-Jones/ESEUR-book/issues/1#issuecomment-287811304 , or mute the thread https://github.com/notifications/unsubscribe-auth/APOBmIX2tmHu2tHTm7jIOhgz1qQtMVCzks5rnqZEgaJpZM4MXKOe . https://github.com/notifications/beacon/APOBmEsUmrAlPmkVVzVyji7J33Qso79hks5rnqZEgaJpZM4MXKOe.gif