alpha-beta-soup / national-crash-statistics

An interactive web map of the New Zealand Transport Agency's Crash Analysis System (CAS) data.
MIT License
12 stars 6 forks source link

NZTA's proposed CAS data agreement #53

Open alpha-beta-soup opened 8 years ago

alpha-beta-soup commented 8 years ago

The NZTA is probably going to change the way in which they publish the data underlying this project.

I first noticed that the data had stopped being updated on the website, which currently extends to only March 2015, when the quarterly update to June (or even September) 2015 should now be available to the public. Then the data download disappeared from the NZTA website entirely (no immediate fear, all the raw data is still contained in this repository as I have it on record that it has a permissive license). I emailed the Statistical Analysis team at NZTA asking for an explanation. Here is the exchange so far:

I was looking to update my record of the CAS data, but the page I have used in the past no longer exists: https://www.nzta.govt.nz/resources/crash-analysis-system-data/

I noticed a few weeks ago that although it was up, the available data only went as far as March this year. Is this data still available for download?

They replied:

Apologies for the delay in replying to your email. We are currently reviewing the CAS information we have available on our website to make sure it meets our privacy and security standards. As the new on line data will be summarised information and I understand you work with geospatial co-ordinates, would you like to apply for CAS access? This will give you access to the same level of information as was previously on the web.

Me again:

Yes I suppose I would like to apply for access, provided that the access will not cost me as I'm only using it as a hobbyist.

However I'm using the CAS data to make a public web map (http://www.nearimprov.com/national-crash-statistics/) in collaboration with some others. Will this review mean that my application will not be able to present the disaggregate information to the public?

I waited a week then prodded again:

Is there any information on this review available? I have some free time available over this summer I am planning to spend developing my mapping application, but I first need to know if the data will still be able to be made public via my application.

And then:

Thanks for your email. There is no cost associated with access to CAS, but you will be required to sign a data agreement. Under this agreement you will only be able to make aggregated information (totals of 4 or more) available through your app. Would you like to proceed with getting CAS access?

I find the proposed changes to be a tremendous step backwards, and I cannot find any information online about this review, including whether any external users of the data were even consulted about the changes. I plan to make an OIA enquiry into the review to determine what decision has been made, how, and why. Just imagine instead of hiding this information away, their review had instead decided to make a public API for accessing it? (This has been recommended: but the type of information will be diminished.) This is public information that the public deserves to be able to see.

I also have absolutely no plans to agree to that data agreement. How can I only show aggregate information meeting their minimum-of-four-crashes requirement that is still useful? There are thousands of intersections that have fewer than four recorded accidents. The best I can come up with is some form of clustering with Voroni polygons, with a summary of four accidents within each irregularly-sized zone. So much detail will be lost.

  1. Does anyone have any ideas about how to meet the requirements of the agreement while still showing useful information? A brilliant idea may make me reconsider my position on signing the data agreement.
  2. Is anyone willing to help me dig into this change? It's an immense backward step.
alpha-beta-soup commented 8 years ago

Just made a post here: http://groups.open.org.nz/groups/ninja-talk/messages/topic/4s6b4LltkXJeQaGEmg4fMi#post-4s6b4LltkXJeQaGEmg4fMi

Mental note to update this with recent developments

timClicks commented 8 years ago

Very interested in helping. Recommend keeping the discussion on the ninjas list as it has a wide readership within the public sector.

alpha-beta-soup commented 8 years ago

Thanks Tim, will do. I'll keep the exchange updated here though given I've already started it.

Richard May I please have a copy of the data agreement for me to consider?

[They kindly send through the agreement, see 3ababa6]

Please find attached a copy of the CAS Privacy Agreement. Please let me know if you would like to progress with getting CAS access, or if you have any questions.

[Read the agreement.]

Richard I find the agreement a little hard to comprehend. Several times throughout the agreement it makes reference to the crash analysis system maintained by the NZTA. I don't want to access the crash analysis system (which I understand is a Windows computer application—please correct me if I am mistaken). I just want to download the information that sits behind the CAS: the same information that is already available online for January 2000 to March 2015 as CSV files. I just want an update to that data. This data is already available for anyone to download from the NZTA website without signing an agreement, and this has been the case for some time: more than one year to my knowledge, possibly longer. I am just looking for this information to be updated with the most recent information available, which I think is either July or October 2015. I have received communication in the past assuring me that this data is licensed for reuse. The data currently only extends to March 2015 on the website, despite statements that it will receive quarterly updates.

Do I need to sign the attached agreement to access updates to this information?

The agreement is not clear in this respect, for instance in sections 13.1 and 13.2 discussing NZTA making "... all reasonable endeavours to provide a secure and reliable system allowing the customer and user to use CAS at all times other than between 5pm and midnight each Saturday, such time being set aside for CAS maintenance." and that "CAS will generally be available to the customer and user unless there is a planned outage, a scheduled backup process, a maintenance window, or an unforeseen interruption."

I find these and other statements confusing because they seem to have little relevance to just providing the data for download. I do not have interest in using anything but the underlying information.

Confusion reigns, until...

Thanks for your interest in safety data. We are in the process of making a new aggregated CAS data set freely available using the Government’s Open Data initiative. This is the data set that will be publically available from now on without the signing of an agreement. This is the data set you could use for your personal mapping tool.

If someone wishes to obtain data in addition to that made available to the general public via the open data initiative, they will be required to sign a data agreement to place some conditions on how the data may be used, in order to ensure data is kept appropriately secure and to protect the privacy of those individuals represented in the data set. For example, you would be able to use the data for research purposes and publish the results in an aggregated form.

I understand, however, that you would like data variables over and above the data that we will be making available via the government open data initiative or that has been published in the past. The way in which we can make certain additional variables available to you is by providing you with CAS access so that you can extract the additional data yourself, however, you will be required to sign a data agreement which places certain conditions on how you may use the data.

There is no cost associated with access to the CAS data base.

Please let me know if you wish to discuss

That was helpful.

Richard Thank you so much for the clarification about accessing the CAS data.

Is there any documentation about the review into CAS that I may read? My application is predicated on disaggregated data. I'd like to contest that there are valid privacy concerns with the use of the data as it currently stands, and also note that if aggregated, the data is not going to be useful for most of the applications I see for it.

For example, if required to aggregate to at least four accidents (as previously stated), after I apply multiple filters (e.g. accidents involving pedestrians at a particular pedestrian crossing in Petone 2000-2015 — as I did recently for the Hutt Cycle Network), the number of relevant accidents falls below this number and my map is then no longer useful as an advocacy tool for commenting on highly-local road network changes (a proposed change to the location of the crossing). I can imagine hundreds of possible permutations of the data just like this. If I am only able to show aggregate information, the function of the map will be restricted to commenting on general trends only (e.g. there were x accidents on pedestrian crossings across all of Lower Hutt... but I can't say where or why exactly). I believe this to be a considerable regression in the utility of the data.

I disagree that there are major privacy concerns that justify changing the service in this way. To the extent that I may be wrong or overruled, I propose that there are better ways of addressing these concerns than requiring aggregation (for example, only specifying an approximate date, or randomising the location slightly, or suppressing the reporting of specific accident codes).

Has there been consultation with the users of the data before these changes were proposed? Myself and others have put considerable volunteer time into my application and I would have made my comments earlier had I known there were going to be changes.

I'm happy to discuss this issue further.

Then there's just a bit of back and forth about establishing a date to meet in person (not possible for me, being based outside of Wellington), and then finally a conference call. This will be in mid January.

alpha-beta-soup commented 8 years ago

I had a meeting today with the manager of statistical information at the NZTA, with James Burgess (Cycle Aware Wellington) in the conference call. It was a good call overall. The level of aggregation they were considering making available publicly is regional, which I said would simply kill this kind of application. The confidentiality agreement would mean only this same level of aggregation is possible for dissemination. They are considering our comments and there will be a follow-up call in two weeks' time. Their major concern is with privacy: they stopped making the CAS data available because of the potential for individuals to be identified from the information. I contested that this is not possible, and to the extent that it is, news websites already report more personal information about accidents very soon after they occur. We said we would never consider trying to link CAS data to social media or news reports of accidents. We also conceded that the reporting of age in some accidents is probably unnecessary, but otherwise the privacy concerns of the CAS data are unfounded.

Age: if a child or children are involved in an accident, the age of the youngest child is reported. Age is not reported otherwise. (Nor is gender.)

alpha-beta-soup commented 8 years ago

Blog post summary of the situation thus far: http://www.nearimprov.com/cas-saga