The screenshots pasted at the bottom of this issue show the summaries we would like to display in the dashboard. Later in Phase 2 of the project, we will perform the intermediate calculations that represent these data summaries.
In this GitHub issue, to complete the remaining items in Phase 1, we will evaluate whether the data we have ingested so far are ready for Phase 2, or whether additional data or other information are required.
In this task, we are focused on loans that were actually approved and originated (i.e., they weren't withdrawn or denied, etc). Please keep this in mind when applying the data filters.
Which data sources contain the info we need? If so, what are the specific file names that they are included in?
Will we need to join any tables? If so, do we have the join keys required to do so? Which fields are the join keys?
[x] 2. Using the data sources and fields mentioned in item 1 above, prepare the minimal dataset required to complete the analysis. For each of the data sources flagged in item 1, we would only include the subset of columns that were identified as needed for one or more of the analyses in the spreadsheet.
[x] 3. After preparing the minimal dataset in item 2 above, perform exploratory data analysis on these data elements. We only need to do this for the specific columns that you identified above. NOTE: The histograms have already been implemented and pushed to the main branch. Please pull in these changes.
Missing data percentages
Show distributions
[x] histograms for numeric values
[x] bar charts to show percentages in each category for non-numeric fields
Screenshots from project summary to show the types of calculations we will need for Phases 2 & 3
Background
Task details
[x] 1. In the banking dashboard data availability spreadsheet, fill in the cells with info about where we can find the data needed to complete each task.
[x] 2. Using the data sources and fields mentioned in item 1 above, prepare the minimal dataset required to complete the analysis. For each of the data sources flagged in item 1, we would only include the subset of columns that were identified as needed for one or more of the analyses in the spreadsheet.
[x] 3. After preparing the minimal dataset in item 2 above, perform exploratory data analysis on these data elements. We only need to do this for the specific columns that you identified above. NOTE: The histograms have already been implemented and pushed to the
main
branch. Please pull in these changes.Screenshots from project summary to show the types of calculations we will need for Phases 2 & 3