Open Data - Githubissues

rcatlord commented 4 years ago

Could the daily confirmed cases and deaths be added as open data to the dashboard? Many analysts and developers are using these data.

edent commented 4 years ago

Agreed. this used to be available as a CSV. I'll chase internally to see what's happened to them.

rcatlord commented 4 years ago

Thanks Terence! The Coronavirus Tracker had all the data behind the dashboard in an .xlsx file and there were links to CSVs:

Just to note - the PHE Coronavirus Tracker states in the footnote that, "Time series data will be available to download from 15/04/2020". A hopeful sign but let's hope it is all the data that was previously available.

slowe commented 4 years ago

Can we have a fixed URL for the "latest" version of the data?

As Tim Berners-Lee says "Cool URIs don't change" and having a fixed URL for the "latest" data allows code/tools/visualisations elsewhere. Yes, there is now an XML endpoint that can be queried and the latest (JSON) file found, but this creates hoops to jump through that stop lots of people and/or use cases. I'm not arguing for the removal of the XML endpoint - definitely keep it. But please have a CSV (and JSON) - at fixed URLs - with the latest snapshots.

rcatlord commented 4 years ago

I notice that the PHE Coronavirus Tracker has been updated with download links but these are JavaScript links and don't point to the actual files. This makes it impossible to retrieve the data programatically using tools like R. Can this please be changed?

xenatisch commented 4 years ago

@rcatlord It is there - have a look at the CSV file.

@edent The CSV export was there. It had just been temporarily removed to split the data into two files instead of one.

@rcatlord and @slowe Unfortunately, the data comes in JSON only, and we had to write code to create the CSV in the browser. If you think the JSON format would be useful, please send an email (provided at the bottom of the page) and make a request.

olihawkins commented 4 years ago

Why have you just removed the time series data for Scotland, Wales and Northern Ireland? It was present in the API data file earlier today, but not in the latest data file. That's useful data, please can you put it back.

rcatlord commented 4 years ago

Thanks @xenatisch. I understand that you can't provide stable URLs for the data because the page is written in JavaScript. However, you could perhaps provide a separate tab beside 'Data dashboard' and 'About the data' with links to the actual data. This would be really helpful to those who want machine readable data. The previous iteration of the Coronavirus Tracker did this with stable links via arcgis.com (see https://github.com/PublicHealthEngland/coronavirus-dashboard/issues/14#issuecomment-613932956).

By the way, the new page is much, much better than the original PHE dashboard. It is clearly laid out and the visualisations clearly show a knowledge of chart design.

xenatisch commented 4 years ago

@olihawkins That data file is not for public consumption. One reason is that it is that which you asked - it contains different sections, and they're not always there. What's displayed on the website and included in CSV files is what is officially available.

If you'd like a JSON source, or would like to see additional details, please send a request to the email provided at the footer of the website.

xenatisch commented 4 years ago

@rcatlord I agree that it would be useful for machine readability. The thing is that due to the huge volume of requests, we're trying to refrain form using dynamic pages.

To that end, if you think that it would be useful to include a stable JSON source (be it an API or a file), please send a request to the email provided at the footer of the website.

I'm sure you appreciate that it is not at the gift of the developers to implement new features to an official service. Requests need to go through the pipeline, be researched, proposed, approved, designed, prepped, implemented and tested before they can be added to the service. It can be done quite quickly if deemed urgent, but it needs go through the proper channels.

rcatlord commented 4 years ago

Thanks @xenatisch - I'll send an email to coronavirus-tracker@phe.gov.uk

olihawkins commented 4 years ago

You have removed, and not replaced, machine-readable datasets with stable URLs. In the absence of any other source of machine-readable data people are going to use whatever is available. I can’t think of another dataset which is more important to the public right now. We are not asking for new features. We are asking for functionality that was available yesterday. The new dashboard is a significant regression from a data user’s point of view.

xenatisch commented 4 years ago

@olihawkins I appreciate your point, but it is not a matter for the development team as it does not concern the functionality of the service, or any existing feature that is officially available. It's not really a bug.

On that note, and given that this is not an issue with the code, it falls outside of our domain and is therefore a matter for the service design and the data teams. The best, most expeditious way to express your views is to get in touch through the email provided at the footer of the page. The delivery manager will ensure that the right team is assigned to the issue and that it is addressed as soon as possible. We have processed urgent requests in a matter of hours in the past.

I will also make sure that I convey your point in our daily meeting tomorrow, though I still think that the email would be much faster.

Regarding the removal of the machine readable source, again it is not a decision made by the development team. As far as I know, the data was not available as JSON in the previous service either - at least not officially, and again I appreciate the you used what you could. It might be that people have been using the data source without having requested it as a feature; and because it is an internal resource, it was changed without notification.

I sincerely apologies for the inconvenience and hope the problem is addressed as soon as possible.

xenatisch commented 4 years ago

@rcatlord I believe there is ongoing work with regards to the time series. The data team are trying to make it available as soon as possible. Our resources are somewhat stretched, but we are doing our best and working day and night to maintain the service and accommodate as many requests as we can.

KevinMayfield commented 4 years ago

If it's any use to anyone: I've been using the CSV, excel and json files from BDM, nhs111 and PHE into a open source health server. https://hapifhir.io/ Server endpoint is https://fhir.test.xgenome.co.uk/R4/ example query for PHE death date is https://fhir.test.xgenome.co.uk/R4/MeasureReport?measure=25444&_count=100&_sort:desc=period and BDM https://fhir.test.xgenome.co.uk/R4/MeasureReport?measure=31531&subject.identifier=E92000001&_sort:desc=period

I've been following international COVID efforts, hence the use of FHIR MeasureReports. Some guidance on the API can be found here: https://www.hl7.org/fhir/measurereport.html

KevinMayfield commented 4 years ago

My dashboard which uses the API is https://project-wildfyre.github.io/covid If you use developer tools/inspect in chrome you will see the queries it is using.

slowe commented 4 years ago

@xenatisch You are correct that "data was not available as JSON in the previous service". I think @olihawkins, me and others are referring to the CSV data that was available in the previous service. It was prominently advertised on the gov.uk page that advertised the previous dashboard. That CSV was hosted at a stable URL. I had been pleasantly surprised at the use of fixed URLs for "latest" datasets as this is something I've been trying to encourage Local Authorities to do for a couple of years. With the latest service, CSV downloads were first removed then later added "back" as a dynamically generated blob from the page. As @rcatlord says, this makes it impossible for code elsewhere to easily grab the data. The request here (as I read it) was to put back a previous "official" feature i.e. a static CSV at a URL that doesn't change. Adding JSON downloads from fixed URLs is an extra.

Also, to be clear, I am aware that you have an XML endpoint that can be polled and the result parsed to try to find the actual address of the JSON you use in the dashboard. Yes developers can all independently go and implement solutions that jump through those hoops. That mulitplies the workload to everyone having to do something that they didn't have to do before Tuesday. It also reduces the number of people who can do that.

The removal of a fixed URL CSV broke the visualisation that we had made for Local Authorities (using the publicly advertised CSV). It broke other people's tools and visualisations. Adding a dynamically generated blob doesn't fix that although it does fix it for many non-developers and local journalists who may have been looking at the files (so thank you for fixing it for them). The web is built of URLs. Dynamically generated blobs avoid using these basic building blocks of the web.

I also note that you said that "we're trying to refrain form using dynamic pages" although the entire Corona Virus dashboard is dynamically generated by Javascript. The only text content in the body of the page is "You need to enable JavaScript to run this app." and the only other human content is in the and <meta description> tags. To me, a static CSV at a fixed URL is static content. </p> <p>Given the mess of all this, I've switched to using <a href="https://github.com/tomwhite/covid-19-uk-data">Tom White's data</a> as a source for my visualisation. His data has seemed more reliable, available at static URLs (with version control), and has data for UTLAs/Health Boards for each of the four nations.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/xenatisch"><img src="https://avatars.githubusercontent.com/u/13240654?v=4" />xenatisch</a> commented <strong> 4 years ago</strong> </div> <div class="markdown-body"> <blockquote> <p>this makes it impossible for code elsewhere to easily grab the data. The request here (as I read it) was to put back a previous "official" feature i.e. a static CSV at a URL that doesn't change. Adding JSON downloads from fixed URLs is an extra.</p> </blockquote> <p>@slowe Believe me, I know and have mentioned it a number of times. As a data scientist and a former academic, I understand how the data is consumed. I will push for JSON / CSV / XML to be made available through a static URL that is programmatically downloadable - at least one of them if not all. But there's so much I can do, which is why I have been advising everyone to get in touch via the email provided at the footer of the page (which is monitored by the delivery manager).</p> <p>I sincerely appreciate that this has screwed up a lot of services that may have depended on the data pipeline, and I do apologise for it. I promise to push for the feature today, and if approved, I'll personally implement it expeditiously. But please get in touch through that email as well. </p> <blockquote> <p>I also note that you said that "we're trying to refrain form using dynamic pages" although the entire Corona Virus dashboard is dynamically generated by Javascript. The only text content in the body of the page is "You need to enable JavaScript to run this app." and the only other human content is in the <title> and tags. To me, a static CSV at a fixed URL is static content.</p> </blockquote> <p>I get what you're saying, but what I meant was dynamically generated pages that come from a server or make calls to a REST API service. Everything (except for the data + the index.html) is cached through ServerWorker so as to minimise the number of calls to the server. The data itself is supplied through a file, not an API. Hope this clarifies what I was trying to say in simpler terms.</p> <p>Again, these are not decisions made by us (the dev team). We are only in charge of implementing and maintaining the features that have been designed and approved by other teams. We don't even control the data source. To change things or implement new features, the proposal should go through the appropriate channels before it comes to us. </p> <p>I hope this clarifies the situation a bit more. I understand your frustration, and we are doing our best to accommodate different requests with limited resources. I know it's not perfect, but we have been working a lot of hours to get things done. </p> <p>Once again, my apologies for the inconvenience. </p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/slowe"><img src="https://avatars.githubusercontent.com/u/299787?v=4" />slowe</a> commented <strong> 4 years ago</strong> </div> <div class="markdown-body"> <p>@xenatisch Firstly, thanks for all you are doing. The dashboard itself is great. I have sent a (possibly too long) email to the address. I will also be mentioning it to the great open data team at NHS Digital some of whom will hopefully be on the #OpenDataSavesLives Zoom call at 11am. It has made me realise I need to do more shouting about "<a href="https://odileeds.org/blog/2020-02-28-eating-your-own-dog-food">eating your own dog food</a>" and about using the web for publishing i.e. using URLs. Hopefully we can get the powers that be to tell you to implement what was previously implemented. As I say, the different output formats would be a nice addition but I do think a fixed URL is a more fundamental issue about how the web works; even if we discard caring about robots/code, nobody can write a link to the "latest" data in a blog post. </p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/slowe"><img src="https://avatars.githubusercontent.com/u/299787?v=4" />slowe</a> commented <strong> 4 years ago</strong> </div> <div class="markdown-body"> <p>@xenatisch Just an additional follow-up, the fact that you grab the latest static file to power the dashboard is actually great. I'm a big fan of static files and use them nearly all the time. The thing I'm probably failing to communicate is that whoever is making that static file could solve all the things I'm saying by making sure to <em>also</em> publish the latest data at a fixed URL. That means they can publish it at <a href="https://c19pub.azureedge.net/data_202004151454.json">https://c19pub.azureedge.net/data_202004151454.json</a> but also have __<a href="https://c19pub.azureedge.net/data_latest.json__">https://c19pub.azureedge.net/data_latest.json__</a> which is a copy of whatever the latest file is. External services now need to all query your XML endpoint (static or spins up a server) <em>and</em> then grab the latest version of the data.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/olihawkins"><img src="https://avatars.githubusercontent.com/u/14982104?v=4" />olihawkins</a> commented <strong> 4 years ago</strong> </div> <div class="markdown-body"> <p>@xenatisch I just want to echo what @slowe is saying. We do appreciate your engagement here and it is very clear that you understand the issue. I think one of the reasons people are pursuing it in this thread is because the dev team is engaging with users, while no-one responds to the emails sent to that email address. I will send another email today making the same point as @slowe, that the new CSV downloads do not address the need for machine-readable data at stable URLs, while a JSON file updated at a fixed URL would. Thank you for taking the time to respond to us here.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/KevinMayfield"><img src="https://avatars.githubusercontent.com/u/7198962?v=4" />KevinMayfield</a> commented <strong> 4 years ago</strong> </div> <div class="markdown-body"> <p>Thanks @xenatisch </p> <p>This isn't just PHE data, NHS 111 data url is also dynamically changing everyday. Suspect weekly BDM data is doing the same (will know when it next updates :) )</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/xenatisch"><img src="https://avatars.githubusercontent.com/u/13240654?v=4" />xenatisch</a> commented <strong> 4 years ago</strong> </div> <div class="markdown-body"> <p>@slowe Trust me, the issue with XML and latest data has been raised, and it has been improved, albeit slowly. It used to be that we had to make requests with different dates to the server until we didn't get a 404. So now we have something to get the info, and that's progress. So... hear hear! </p> <p>@slowe @olihawkins We are working on possibly creating an ETL and serve the data in different formats. We need to find a feasible DevOps solutions. Some bots are hitting the page every 10 seconds to check for updates, so! </p> <p>Also, people are actually reading the emails. It's just that there are thousands of them and it takes a bit of time to go through them. Don't know about the comms though. </p> <p>@KevinMayfield Most (if not all) of them come from the same source! </p> <p>Finally, I respond here because I am maintaining the code and want to ensure that there's nothing technical that I can do to solve any issues. It also helps me understand the needs, and provide better recommendations (if or when asked). </p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/pipiscrew"><img src="https://avatars.githubusercontent.com/u/3852762?v=4" />pipiscrew</a> commented <strong> 4 years ago</strong> </div> <div class="markdown-body"> <p>the daily data, for each country is available at YogeshChauhan - Historical for all countries (by pressing timeline)</p> <blockquote> <p><a href="https://www.yogeshchauhan.com/Projects/COVID-19/coronavirus-global-live-tracker-by-yogesh-chauhan.php?url=live-covid-19-tracker-usa-india-and-global">https://www.yogeshchauhan.com/Projects/COVID-19/coronavirus-global-live-tracker-by-yogesh-chauhan.php?url=live-covid-19-tracker-usa-india-and-global</a></p> </blockquote> <p>if you need an API, most safe :</p> <blockquote> <p><a href="https://github.com/sagarkarira/coronavirus-tracker-cli">https://github.com/sagarkarira/coronavirus-tracker-cli</a></p> </blockquote> <p>also this has different views</p> <blockquote> <p><a href="https://www.worldometers.info/coronavirus/">https://www.worldometers.info/coronavirus/</a></p> </blockquote> <p>moreover, I have list other sources at </p> <blockquote> <p><a href="https://bit.ly/3csfnuI">https://bit.ly/3csfnuI</a></p> </blockquote> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/ha-san-ali"><img src="https://avatars.githubusercontent.com/u/29210115?v=4" />ha-san-ali</a> commented <strong> 4 years ago</strong> </div> <div class="markdown-body"> <p><img src="https://github.trello.services/images/mini-trello-icon.png" alt="" /> <a href="https://trello.com/c/PvQicSQI/260-feature-a-permanent-url-or-csv-link-rather-than-a-dynamically-generated-one">Feature: A permanent URL or CSV link rather than a dynamically generated one</a></p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/xenatisch"><img src="https://avatars.githubusercontent.com/u/13240654?v=4" />xenatisch</a> commented <strong> 4 years ago</strong> </div> <div class="markdown-body"> <p>Hi all, just wanted to let you know that we're making progress with this and are hopping to run a beta test later this afternoon. If passed, we will go ahead with the deployment soon afterwards.</p> <p>@rcatlord @edent @slowe @olihawkins @KevinMayfield @pipiscrew</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/xenatisch"><img src="https://avatars.githubusercontent.com/u/13240654?v=4" />xenatisch</a> commented <strong> 4 years ago</strong> </div> <div class="markdown-body"> <p>Solution via an ETL: <a href="https://github.com/PublicHealthEngland/coronavirus-dashboard-pipeline-etl">https://github.com/PublicHealthEngland/coronavirus-dashboard-pipeline-etl</a></p> <p>The service will be deployed soon. This issue will be associated with the commit that dispatches the service.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/rcatlord"><img src="https://avatars.githubusercontent.com/u/12719575?v=4" />rcatlord</a> commented <strong> 4 years ago</strong> </div> <div class="markdown-body"> <p>Hi @xenatisch, this Twitter thread might be useful to review: <a href="https://twitter.com/dracos/status/1251836594509791232">https://twitter.com/dracos/status/1251836594509791232</a></p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/xenatisch"><img src="https://avatars.githubusercontent.com/u/13240654?v=4" />xenatisch</a> commented <strong> 4 years ago</strong> </div> <div class="markdown-body"> <p>@rcatlord As nice as that is, they cannot display the crown copyright and the crown logos, or use the GDS font. It violates the Open Government License. Someone should inform them.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dracos"><img src="https://avatars.githubusercontent.com/u/154364?v=4" />dracos</a> commented <strong> 4 years ago</strong> </div> <div class="markdown-body"> <p>Not deliberately, they just came across with everything else, Crown logo and font gone now, thanks. Not really relevant to this ticket, although yes, it does have static CSV file links. </p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/xenatisch"><img src="https://avatars.githubusercontent.com/u/13240654?v=4" />xenatisch</a> commented <strong> 4 years ago</strong> </div> <div class="markdown-body"> <p>@dracos Thanks... I know they come with everything else. It's because we have everything here so that we can deploy with CI. It's also more transparent - which we are trying to be... But GDS is very strict about the logos / fonts, we can't even use them on our test domains if they don't end with gov.uk.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/xenatisch"><img src="https://avatars.githubusercontent.com/u/13240654?v=4" />xenatisch</a> commented <strong> 4 years ago</strong> </div> <div class="markdown-body"> <p>Deployment of <a href="https://github.com/PublicHealthEngland/coronavirus-dashboard/releases/tag/1.1.0">Release v1.1.0</a> resolves this issue. </p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/rcatlord"><img src="https://avatars.githubusercontent.com/u/12719575?v=4" />rcatlord</a> commented <strong> 4 years ago</strong> </div> <div class="markdown-body"> <p>Brilliant - thanks for all your hard work @xenatisch!</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/olihawkins"><img src="https://avatars.githubusercontent.com/u/14982104?v=4" />olihawkins</a> commented <strong> 4 years ago</strong> </div> <div class="markdown-body"> <p>@xenatisch Thank you very much for responding to feedback from users and for turning this around so quickly.</p> </div> </div> <div class="page-bar-simple"> </div> <div class="footer"> <ul class="body"> <li>© <script> document.write(new Date().getFullYear()) </script> Githubissues.</li> <li>Githubissues is a development platform for aggregating issues.</li> </ul> </div> <script src="https://cdn.jsdelivr.net/npm/jquery@3.5.1/dist/jquery.min.js"></script> <script src="/githubissues/assets/js.js"></script> <script src="/githubissues/assets/markdown.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/highlight.min.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/languages/go.min.js"></script> <script> hljs.highlightAll(); </script> </body> </html>

UKHSA-Internal / coronavirus-dashboard

Open Data #14