Initial Estimate - Hard-coded data from an old Dashboard state. Data collected on May 22. Last scrape run: ?
Based on Original Scraper - Data come from Stats module (metrics.xlsx file).
Ingested into data portal - Data come from CKAN API.
Datasets Ingested into the Portal by Publisher - Data come from CKAN API.
Resources Ingested into the Portal by Publisher - Data come from CKAN API.
Datasets Ingested into the Portal by Domain - Data come from CKAN API.
Resources Ingested into the Portal by Domain - Data come from CKAN API.
3) Data Quality - Data come from Stats module.
4) Trends - Data come from Stats module.
New feedback (18th May)
1) New homepage - call it Dashboard
Total number datasets in portal
Total number of scraped datasets
Total number of scraped datasets which have been amended by a user
Total number of datasets manually added
Pie chart
3) Insights will be relabled Scraping Insights
4) At the top of insights page we have 2 rows of data the first is “Based on original scraper” and the second is “Ingested into data portal”
5) Re-order the insight page
Datasets by Publisher (instead of datasets by scraper)
Resources by Publisher
Datasets by domain
Resources by domain
6) Re-name headers and include toolips etc as per this document
7) Delete the pages in data quality tab
Previous feedback
Notes regarding Dashboard improvements. These tasks/issues were raised during a brainstorming/QA session
Insights
The relation between the numbers on top and the numbers below (number of resources 52709, but the total of the numbers by domain does not total to 52709). Clear labels. So I think the top is “Total Datasets etc that are able to be scraped” And then piechart and table are “Total datasets that were scrapped”
The domain/resource count table needs a total
Order: from highest to lowest number (eg. studentaid.gov is 787 is 3rd from bottom to top but it has to be much higher)
Order of graphs:
Datasets by domain
Datasets by offices (instead of datasets by scraper)
Resources by domain
TITLE: Scraping Dashboard should be a proper title (also bigger than the subtitle: “Statistics on data extracted from ED sites”)
The logos are huge! Let’s get rid of them (maybe move them to the bottom - made by CA and Datopian but sth that’s not so huge and eye-catching). Dept Ed will use it so they wouldn’t have
What are these small icons? Remove them
Data quality
We should include a Note below these offices:
how these are calculated or else the % doesn’t make any sense.
What makes these green, amber, red (above ?% is amber, below ?% is red, etc)
We should remove the score and leave only the Percentage (the max is 100%)
And rename it to “Score” (= the score will be in %)
Trends
Have a single trend graph with multiple trendlines for the various offices
Either i) add data points or ii) get rid of the trends (please try i) )
We want to see how over the last sprints we improved data numbers steadily, so include data points from the previous sprints. The older data points from earlier sprints were generated when the dashboard was hosted on DigitalOcean - Victor has the backup files for the
Air comparison
Delete it (don’t delete the code though so that: in case the clients say they want it back, we will be able to retrieve it quickly)
Acceptance Criteria
[x] Have all of these points implemented and tested
Tasks
[x] Create new homepage called Dashboard
[x] New side nav bar
[x] Datasets values (the total is real (based on scraped + hard-coded), the scrapped data (not amended) is real, the scraped amended is hard-coded, and the added by user is hard-coded)
[x] Piechart
[x] The 'scraping dashboard' should be like a home button. When the user clicks on it, it takes the user back to the Dashboard page
[x] QA by Osahon
[x] Sign-off by Esteban
[x] Work on insights page of the dashboard
[x] Order: from highest to lowest number
[x] Insights will be relabled Scraping Insights
[x] Relabel office as Publisher
[x] Add resources by Publisher
[x] Order of graphs
[x] Re-name headers and include toolips
[x] Redo the top values (2 lines of datasets, resources, pages, domains)
[x] Include the html basic CSS
[x] Insert link to the right numbers
[x] Check the total numbers in Led Display
[x] Add a total in all 3 tables
[x] Re-do the 3 lines at the top (initial estimate, based on scraper, ingested)
[x] QA by Osahon
[x] Sign-off by Esteban
[x] Work on data quality page of the dashboard
[x] Remove the score and leave only the Percentage (the max is 100%) and rename it to “Score”
[x] Add the notes
[x] Remove the pages
[x] QA by Osahon
[x] Sign-off by Esteban
[x] Work on the trends page of the dashboard
[x] Have a single trend graph with multiple trendlines for the various offices
[x] Add older data points from earlier sprints
[x] Remove Air Comparison page
[x] Update Dashboard documentation
[x] Automate the dashboard, so it shows updated stats automatically
Implementing improvements to dashboard stats (Phase 2)
Description
Data Sources
1) Dashboard Home Page - Data come from CKAN API (https://us-ed-testing.ckan.io/). A new API endpoint was created to return all the values that we need. (https://us-ed-testing.ckan.io/api/action/ed_scraping_dashboard)
2) Scraping Insights
Initial Estimate - Hard-coded data from an old Dashboard state. Data collected on May 22. Last scrape run: ?
Based on Original Scraper - Data come from Stats module (metrics.xlsx file).
Ingested into data portal - Data come from CKAN API.
Datasets Ingested into the Portal by Publisher - Data come from CKAN API.
Resources Ingested into the Portal by Publisher - Data come from CKAN API.
Datasets Ingested into the Portal by Domain - Data come from CKAN API.
Resources Ingested into the Portal by Domain - Data come from CKAN API.
3) Data Quality - Data come from Stats module.
4) Trends - Data come from Stats module.
New feedback (18th May)
1) New homepage - call it Dashboard
Previous feedback
Notes regarding Dashboard improvements. These tasks/issues were raised during a brainstorming/QA session
Insights
The relation between the numbers on top and the numbers below (number of resources 52709, but the total of the numbers by domain does not total to 52709). Clear labels. So I think the top is “Total Datasets etc that are able to be scraped” And then piechart and table are “Total datasets that were scrapped”
The domain/resource count table needs a total
Order: from highest to lowest number (eg. studentaid.gov is 787 is 3rd from bottom to top but it has to be much higher)
Order of graphs:
TITLE: Scraping Dashboard should be a proper title (also bigger than the subtitle: “Statistics on data extracted from ED sites”)
The logos are huge! Let’s get rid of them (maybe move them to the bottom - made by CA and Datopian but sth that’s not so huge and eye-catching). Dept Ed will use it so they wouldn’t have
What are these small icons? Remove them
Data quality
Trends
Have a single trend graph with multiple trendlines for the various offices
Either i) add data points or ii) get rid of the trends (please try i) )
We want to see how over the last sprints we improved data numbers steadily, so include data points from the previous sprints. The older data points from earlier sprints were generated when the dashboard was hosted on DigitalOcean - Victor has the backup files for the
Air comparison
Acceptance Criteria
Tasks