africanmathsinitiative / R-Instat-Help

The latest uncompiled (.hnd) and compiled (.cmd) versions of the help file for the statistics software package R-Instat
https://chuffed.org/project/africandatainitiative
GNU General Public License v3.0
2 stars 5 forks source link

The World Bank Procurement Data - 2 #11

Open rdstern opened 6 years ago

rdstern commented 6 years ago

The same data are used as in xxx.

  1. We start again and open it from the library. This time we use the icon on the toolbar.
  2. The WorldBank.RDS file is opened as before, as follows. (then do it, and sort - not in the video - on the ID which is how the data will be soon.)
  3. As before, we click on the metadata icon in the toolbar. This window is made larger, and the first column is expanded to see the variable names clearly.
  4. The procurement menu is opened. If, as here, it is not yet visible, then use View and tick on the relevant entry.
  5. There are usually a number of preparatory steps before the data are analysed. We limit the analyses to 5 countries in East Africa. In the Procurement menu, choose Prepare and Filter by country. Then we choose Burundi, Kenya, Rwanda Tanzania and Uganda - all in East Africa. Once the data are filtered, the row numbers on the left-hand side are now coloured. The information below the data shows that there are a total of 8304 World Bank contracts from these 5 countries.
  6. We now look at the red flags. These are variables that can indicate an issue with a contract. These can be changed, but 5 are shown to start with. The first is the submission period for the bid for the contract. This is the 47th variable in the data and is called subm_p. If this is short it can indicate a problem. The next two we propose to use are also numeric. The first (variable 156) is the cost overrun. When it is large it can indicate a problem. Next is the share of the project for the supplier who won this contract. The final two red flags are logical columns. The first is whether the contract was an open procedure or not. And the second is whether the winner was from a small tax haven.
  7. The first step in the analysis is often to look at the structure of the key variables. For this data set we first use the Procurement > Describe > One Variable Summarise dialogue. For these data this dialogue has been pre-filled with some of the key variables. So OK can be used directly. In the results we notice, for example (what do you notice?)
  8. It is also useful to look graphically at the same information. This uses Describe > One Variable > Graph dialogue. We tick the box to display the graphs horizontally for clarity, and then press OK. This shows, for example, that ... (what do you notice here?)
  9. The information, from these countries, on the sectors is not clear from the small graph. So we return to this dialogue - we use the icon on the toolbar for this - and delete all variables except ca_sector and then press OK again. Now the graph is much clearer.
  10. An alternative way of showing the same information uses the Describe > One Variable > Frequencies. This gives the results in a browser (for now) and provides a one-way table for each variable.

There is much more that is possible with these data and we therefore contnue the analysis in the next videos.

mmumbo commented 6 years ago
  1. Adding columns to the country-level data (and drawing a map again). Where are the extra columns data that we will be adding before producing the maps again?
mmumbo commented 6 years ago

Procurement > Describe > One Variable Summarise dialogue. *Even after diselecting Afghanistan, it still appears from the summary, could this be a problem with the software?

rdstern commented 6 years ago

In the video I would simply not comment on that point.

It gets much worse in the 2-variable case, where to cases are dropped, but you still get a table with all 200 countries! We need to do something about that in the software. I think David and Danny may be able to organise that unused levels are dropped in the filter. (Though we have to be careful, in that we may want unused levels if they are in the set being filtered. (Not often though!)

mmumbo commented 6 years ago

ExploreWorldBankData.docx

Hello @rdstern attached is the latest script that we want want to produce a video on but the following parts are not yet clear: 1)Neither column 178 nor 179 are talking about Tax Havens so could this be a mismatch from the earlier script you sent? instead you wanted to mean column 188?

2.)From the script from number (6) and (7), when you ask "What do you notice?" does that mean we need to let the user try to find interesting things in the analysis on their own or should we explain the output of the analysis?