colouring-cities / colouring-britain

Developed out of the Colouring London prototype. Collecting data on Britain's buildings and testing new core features
https://colouringbritain.org/
GNU General Public License v3.0
10 stars 2 forks source link

DATA ETHICS/SECURITY. Discussion and feature status #106

Closed polly64 closed 6 months ago

polly64 commented 2 years ago

@tomalrussell @mz8i @matkoniecz do send any comments

This first comment box under the 'Data ethics/security' issue is used to both highlight issues and track progress on Colouring Cities content, interface features, and governance, that address concerns relating to privacy, security, transparency, inclusivity, data quality, interoperability and data/code accessibility. Our aim is to maximise access to datasets necessary for the scientific analysis of cities, to support the United Nation's New Urban Agenda, and to create features which facilitate a whole-of-society approach to urban sustainability, whilst at the same time prioritising data ethics/security issues. Data capture is undertaken where considered necessary for the performance of a task relating to this aim, carried out in the public interest. Ethical issues are also discussed at https://www.pages.colouring.london/data-ethics. An introduction to location data ethics issues, by the Alan Turing Institute can be accessed here: A public dialogue on location data ethics cl

SETS OF ETHICAL PRINCIPLES/DEFINITIONS COLOURING LONDON/COLOURING CITIES PLATFORM ARE CHECKED AGAINST:

  1. GENERAL DATA PROTECTION REGULATION (GDPR) Oversight: (UK) The Information Commissioner's Office (ICO) Link: https://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr/principles/ Colouring London is required to meet GDPR requirements with regard to personal data on individuals. GDPR principles are also applied to all types of data collected as great care is also needed when handing certain types of spatial data relating to people's homes, especially data relating to domestic building interior space/activities, and to ownership. (Domestic buildings make up the vast majority of buildings in national building stocks). GDPR data principles:

    • Lawfulness
    • Fairness
    • Transparency
    • Purpose limitation
    • Data minimisation
    • Accuracy
    • Storage limitation
    • Integrity
    • Confidentiality (security)
    • Accountability,
  2. ODI's DATA PRINCIPLES Oversight: The Open Data Institute Link: https://theodi.org/article/openness-principles-for-organisations-handling-personal-data/. Principles of data collection/Questions to be transparent about.

  3. ODI's DATA INFRASTRUCTURE PRINCIPLES) Oversight: The Open Data Institute Link: https://theodi.org/article/principles-for-strengthening-our-data-infrastructure/ Principles relating to:

    • Design for Open
    • Build with the web
    • Respect privacy
    • Benefit everyone
    • Think big but start small
    • Design to adapt
    • Encourage open innovation
  4. OPEN KNOWLEDGE FOUNDATION'S (OKF) OPEN DEFINITION 2.1 Oversight: The Open Knowledge Foundation Link: https://opendefinition.org/od/2.1/en/ The Colouring Cities research Programme (CCRP) promotes Open Knowledge: The OKF defines knowledge as 'open if anyone is free to access, use, modify, and share it — subject, at most, to measures that preserve provenance and openness'.

  5. THE OPEN DATA CHARTER Oversight: The Open Data Charter Link: https://opendatacharter.net/principles/ Principles of openness:

    • Open by default
    • Timely and comprehensive
    • Accessible and useable
    • Comparable and Interoperable
    • For improved governance and citizen engagement
    • For inclusive development and innovation
  6. GEMINI PRINCIPLES Oversight: The Centre for Digital Britain (University of Cambridge): Link: https://www.cdbb.cam.ac.uk/DFTG/GeminiPrinciples. The CCRP promotes the Gemini Principles, developed by the Centre for Digital Britain at the University of Cambridge (2019) to provide a 'conscience' for the framework for information management systems on the built environment/infrastructure, and for national digital twins, and to ensure these remain focused on the public good. Principles for built environment information management systems:

    • Public good
    • Value creation
    • Insight
    • Security
    • Openness
    • Quality
    • Federation
    • Curation
    • Evolution gemini principles
  7. THE NEW URBAN AGENDA Oversight: The United Nations Link: https://www.un.org/sustainabledevelopment/blog/2016/10/newurbanagenda/ and https://habitat3.org/the-new-urban-agenda/ The CCRP promotes the UN New Urban Agenda, created to drive global commitment to the goal of sustainable, inclusive, healthy and resilient cities and stocks: UN New Urban agenda summary principles:

    • Provide basic services for all citizens (e.g. housing, water, sanitation, food healthcare, education, culture,communication technologies.
    • Ensure that all citizens have access to equal opportunities and face no discrimination
    • Promote measures that support cleaner cities (air pollution, greenspaces, renewage energy/transport)
    • Strengthen resilience in cities to reduce the risk and the impact of disasters (better urban planning, quality infrastructure and improving local responses).
    • Take action to address climate change by reducing cities' greenhouse gas emissions
    • Fully respect the rights of refugees, migrants and internally displaced persons regardless of their migration status
    • Improve connectivity and support innovative and green initiatives (including supporting cross sector partnerships)
    • Promote safe, accessible and green public spaces
  8. THE UNIVERSAL DECLARATION OF HUMAN RIGHTS Oversight: The United Nations Link: https://www.un.org/en/about-us/universal-declaration-of-human-rights The CCRP works to support the UDHR, and specifically the following articles/principles (of 30 Articles):

    • Article 1: All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.
    • Article 2: Everyone is entitled to all the rights and freedoms set forth in this Declaration, without distinction of any kind, such as race, colour, sex, language, religion, political or other opinion, national or social origin, property, birth or other status. Furthermore, no distinction shall be made on the basis of the political, jurisdictional or international status of the country or territory to which a person belongs, whether it be independent, trust, non-self-governing or under any other limitation of sovereignty.
    • Article 3: Everyone has the right to life, liberty and security of person.
    • Article 12: No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honour and reputation. Everyone has the right to the protection of the law against such interference or attacks
    • Article 19: Everyone has the right to freedom of opinion and expression; this right includes freedom to hold opinions without interference and to seek, receive and impart information and ideas through any media and regardless of frontiers. (Note: Such speech must also respect other UDHR Articles).
    • Article 21. Everyone has the right to take part in the government of his country, directly or through freely chosen representatives. Everyone has the right of equal access to public service in his country.
    • Article 25: Everyone has the right to a standard of living adequate for the health and well-being of himself and of his family, including food, clothing, housing and medical care and necessary social services, and the right to security in the event of unemployment, sickness, disability, widowhood, old age or other lack of livelihood in circumstances beyond his control.
    • Article 27: Everyone has the right freely to participate in the cultural life of the community, to enjoy the arts and to share in scientific advancement and its benefits. Everyone has the right to the protection of the moral and material interests resulting from any scientific, literary or artistic production of which he is the author.

FEATURE CHECKLIST/STATUS DESIGNED TO ADVANCE SETS OF PRINCIPLES SHOWN ABOVE

A. OPENNESS OF KNOWLEDGE, CODE & DATA/SUPPORTING INNOVATION

PRIVACY & SECURITY

TRANSPARENCY

VALUE CREATION/PUBLIC GOOD currently being edited

INTEROPERABILITY

INCLUSIVITY, USER RESPECT, PUBLIC ENGAGEMENT, FAIRNESS

CURATION AND EVOLUTION// PLATFORM REPRODUCTION/SUSTAINABLE MANAGEMENT MODEL

QUALITY CONTROL, DATA ACCURACY

DATA ACCURACY AGREEMENT Host accountability disclaimer as follows to protect academic hosts and show users that responsibility lies with users regarding ethical principles regarding the application of data:

_Colouring London data are provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, accuracy, fitness for a particular purpose and non-infringement. In no event shall the Alan Turing Institute be liable for any reliance that you place on or how you use the data nor any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the data or the use or other dealings in the data

Colouring London data are crowdsourced from multiple sources and may contain errors. Though we cannot comment on the accuracy of data, we are continue to design features to help users assess the reliability of datasets, and their suitability for specific types of use (be this a school project or scientific paper. Access to information on sources is very important and contributors are asked to add these to support other users and make the database as useful as possible. We also ask contributors to verify existing data entries, by clicking 'verify' buttons wherever appropriate.

If you have suggestions for additional ways to improve data accuracy features please contact us at ADD NEW LINK_

CONTRIBUTOR AGREEMENT @tomalrussell @mz8i @matkoniecz to discuss

_Contributor responsibilities We ask all our contributors:

Notes for contributors

Open data Colouring London is an open data project which contributions are open data, licensed under the Open Data Commons Open Database License (ADD LINK https://opendatacommons.org/licenses/odbl/) by Colouring London contributors. 'UNDER THIS LICENCE' You are free to copy, distribute, transmit and adapt our data, as long as you credit Colouring London and our contributors. If you alter or build upon our data, you may distribute the result only under the same licence.

What you are contributing to 'Colouring London is a free knowledge exchange platform and public open database designed for the public good, to support a whole-of-society approach to the development of sustainable, resilient and inclusive cities. The platform is guided by the United Nations New Urban Agenda, the Open Data Charter, The Gemini Principles, and by personal data and data infrastructure principles relating to open data initiatives as set out by the Open Data Institute. More information is available on other Menu pages including 'About' and 'Data Ethics' pages, as well as on our GitHib data ethics action page at https://github.com/colouring-london/colouring-london/issues/. We focus on the capture of spatial statistics for public use, including for public visualisation and academic research, and do not (currently) collect text or images. All types of spatial data we collect can be viewed by clicking on our data category on our coloured grid (below) , and on our 'Building data categories' page ADD link. We are also planning a 'Showcase section' to allow applications of the data to be easily uploaded and viewed.

Diversity and inclusivity We are very grateful for all constructive contributions that our contributors are able to give. We respect and actively seek diversity of contributors and audiences, and celebrate diversity of knowledge. Our platform is designed for everyone and we are working to make it as inclusive, welcoming and accessible as possible. We explicitly encourage community engagement, and use colour, crowdsourcing, and non-technical language to reduce barriers to the contribution of spatial statistics and to make the process rewarding and interesting. We also look, through this process to encourage communities to become more engaged in informing and improving urban governance, and to support a whole-of society approach to improving urban sustainability. Diversity of contributors and audiences, and of user ages, genders, skills and abilities, and cultural backgrounds is also essential to allow us, as communities, to make our cities and towns more inclusive, equitable, sustainable and resilient places, in line with UN New urban Agenda goals. To do this we need to collect information not only composition of our stocks, their energy performance, and their dynamic behaviour, but also on building quality and how well specific types actually work, to help improve them and to inform what we should reuse, demolish or build anew, and what we should build in future.

Copyright and data accuracy and quality We are unable to accept any data derived from copyrighted or restricted sources, other than those covered by fair use, or from illegal sources and therefore it is important to check data sources against these criteria. We are also unable to take responsibility for the quality of datasets as it is not feasible to check each data entry, and as users will require different degrees of accuracy depending on what they are using the data for (e.g. school project or scientific paper). However data our aim is to make our data as useful and reliable as possible. The best way to do this is to include sources(PH to aks M Should we require source first?) We therefore ask you to add sources wherever possible - this may range from an assessment of building features from the street, to information in book, map or another open database. We also ask you to verify other data entries wherever if you can by clicking ou verification button. This means that anyone using our data can make a more informed judgement on whether the data are suitable (in terms of accuracy/quality) for their specific purpose.

Privacy/security issues and data dissemination Please note when you make a contribution to Colouring London, you are creating a permanent, public record of all data added, removed, or changed by you.The database records the username and ID of the user making the edit, along with the time and date of the change. SPACE All of this information is also made publicly available through the website and through bulk downloads of the edit history. User names of contributors providing the highest number of edits are also included on our Leaderboards. The privacy and security of both platform users, and of building occupiers, are of key importance. To help protect your privacy as a user we recommend the use of pseudonyms and that you give us as little data about yourself as possible. Please note that when you contribute to Colouring London, you make your contributions available as open data for anyone to copy, distribute, transmit and adapt in line with the licence, and that data you are adding will be made open and may be used by anyone in any way. Data ethics page uses the Open Data Institute's data ethics canvas to address key questions on how we use and manage and protect data.

Contributing as an organisation If you are associated with an organisation e.g a school, professional institution, community group, local authority etc, and you would like the number of entries contributed by members of that organisation to to grouped on our Leaderboard, you can use the organisation' name, followed by an underscore sign, plus your initials/chosen name e.g. TheVictorianSociety_AD or TheLondonBoroughofHackney_Joey.

Informing users of any privacy and security concerns/Feedback mechanism Though rigorously assess each data type to protect users' and building occupiers' privacy and security we welcome any ideas for improvements. If you have any concerns, or recommendations for improvement regarding privacy or security relating to our platform, and/or datasets please contact us at........ We also welcome constructive recommendations for improvements to the site as a whole. Behind-the-scenes progress on proposed platform features can be tracked on our Github site https://github.com/colouring-london/colouring-london/issues/_

polly64 commented 2 years ago

@matkoniecz comment on @polly64 question 'should we include a phrase in our open licence agreement/contributor agreement with regard to user agreement to use the data for the public good'

Summary comment

'The goal makes sense, but as far as I know, there is no existing solution. Methods known to me would cause issues for ethical users while not stopping evil use. Solving this is not easy and I have no idea how to start solving it.

If there would be a known working solution it would be a good idea. But in the current situation I would recommend against amending license in such way.

Full comment:

It is a very interesting problem.

Unfortunately "interesting" part is related to several problems with doing this.

Doing it successfully would require solving extremely hard philosophical, legal and practical issues.

As far as I know, this problem remains unsolved. And solving it would be one of the largest achievements in human history.

In practice doing this right now is likely to cause problems for ethical users of code while being ignored by evil entities.

(1) It is quite tricky to define what would be unethical/evil use.

And even clearly evil entities will argue that their actions were not evil. Often they even really believe it.

So "no evil use" is too ambiguous. And trying to define morality is very hard to do. Note that many serious political issues are exactly about conflict is something evil or not. It applies both to past issues

(2) Making it legally binding (a part of the license) would cause licensing incompatibility.

(3) Evil entities are likely to ignore licensing anyway. For example, FSB assassinated people in the UK ( https://en.wikipedia.org/wiki/Poisoning_of_Alexander_Litvinenko ), with basically no consequences. If they would use software with a "do no evil" clause - then we would not be able to do anything about it.

(4) Previous attempts to do this resulted in various problems and complications.

See for example problems related to such licensing used by Douglas Crockford. This case caused problems for various reusers of this code who cared about licenses - and Douglas anyway granted "IBM, its customers, partners, and minions" a special permission "to use JSLint for evil" https://wiki.debian.org/qa.debian.org/jsonevil is an example of a critical viewpoint on that

I am not aware of such licensing being successful.

Alternatives:

(A) Include a nonbinding request? The problem is that it would be completely toothless and unlikely to stop actual evil use.

(B) Develop software and features that has=ve relatively low risk of being useful for evil purposes?

See also:

Ismael-KG commented 2 years ago

I'm still reading through this and don't usually engage so much with licences, so, under A. OPENNESS OF KNOWLEDGE, CODE & DATA/SUPPORTING INNOVATION, I just want to ask three questions about:

QUESTION 1: Are the two compatible? This might be a silly question on my part. It seems they are if we consider "an OKF GNU" to include ODBl. Am I reading this correctly?


However, the GNU linked to is a series of licenses renowned for:

"The GPL series are all copyleft licenses, which means that any derivative work must be distributed under the same or equivalent license terms. This is in distinction to permissive software licenses, of which the BSD licenses and the MIT License are widely used, less restrictive examples" (General public licence).

QUESTION 2: Is this need for the same licence in derivative works intentional and appropriate for Colouring London?


In the meantime, ODbL say:

"4.8 Licensing of others. You may not sublicense the Database. Each time You communicate the Database, the whole or Substantial part of the Contents, or any Derivative Database to anyone else in any way, the Licensor offers to the recipient a license to the Database on the same terms and conditions as this License" (ODbL's full text).

and

"Use your own license for the contents: You are welcome to apply your own specific license to the contents of the database instead of the Database Contents License. To do this just replace the second sentence with information about the license you wish to use" (ODbL).

QUESTION 3: I think they are saying you can have ODbL for the database (and its derivatives) and any licence for "content." I am not quite sure what the different is, but does this make sense in the context of Colouring London?

Ismael-KG commented 2 years ago

On TRANSPARENCY, I am thinking about:

These seem very closely linked, both the aim (more engagement) and the struggle (small team). I am going to suggest whether there is capacity for newsletters (maybe quarterly) where you can also send out surveys for specific feedback. This way, you are in control of the influx of information. Of course, make the surveys too stringent, and you'll miss quite a bit! But you could also use newsletters to build a community interested in the project?

matkoniecz commented 2 years ago

Warning: I am not a lawyer.

Are the two compatible? This might be a silly question on my part.

Yes, in general license of data being processed and license of code processing this data is separate.

For example colouring-london code is GPL-3.0 Licensed, while it is stored using a proprietary GitHub.

One may write openly-licensed text using proprietary Word.

One may write text not released under open license - using openly licensed LibreOffice.

(and it is not a silly question! In some cases this can be quite tricky, but in general this is quite important that data being processed by software and software may have completely separate licenses)

For other specific case: https://github.com/openstreetmap/iD (ISC licensed) and https://github.com/streetcomplete/StreetComplete (GPL 3.0) are among editors used to edit ODBL licensed OpenStreetMap database.

matkoniecz commented 2 years ago

Warning: I am not a lawyer.

Is this need for the same licence in derivative works intentional and appropriate for Colouring London?

Note that "derivative work" is in this case modified source code, not database created using it.

And in general GPL-3.0 is quite good fit (depending on situation different licenses may be better, but this specific issue is not a reason to change anything)

matkoniecz commented 2 years ago

QUESTION 3: I think they are saying you can have ODbL for the database (and its derivatives) and any licence for "content." I am not quite sure what the different is, but does this make sense in the context of Colouring London?

It applies to cases where database would store things that can be also copyrighted/licensed.

Lets say that someone would make a curated database of photos. In such case there could be separate license for

For example there could be CC-BY-SA 4.0 image in ODBL licensed database.

As far as I know individual contents in Colouring London database would not raise above threshold of originality nor be qualifying for sweat of the brow type protection.

Warning: Still not a lawyer, here I am not fully sure but I think that my answer is more informing than misleading. It is possible that researching some property (like dates) would be significantly complicated and qualify for protection, but it seems quite unlikely.

matkoniecz commented 2 years ago

These seem very closely linked, both the aim (more engagement) and the struggle (small team). I am going to suggest whether there is capacity for newsletters (maybe quarterly) where you can also send out surveys for specific feedback. This way, you are in control of the influx of information. Of course, make the surveys too stringent, and you'll miss quite a bit! But you could also use newsletters to build a community interested in the project?

As I understand right now feedback would be mostly useful to prioritize one of many things to do. Quarterly newsletters and processing this may take significant effort and imply capacity for more development than available.

Ismael-KG commented 2 years ago

So thorough, thank you @matkoniecz !!!

matkoniecz commented 2 years ago

mirroring https://github.com/colouring-london/colouring-london/issues/682#issuecomment-961244520

How do we check weblinks are ok?

I see four ways to handle this:

The questions are:

Has this problem happened so far? Or is it treated as so significant that making contributing and site maintenance harder is a worthwhile tradeoff to close this potential attack?

How varied the sites used as sources are? How often it would be necessary to often add new ones? What about deployment in Athens? The likely would need to add many local sites to list of valid ones.

In short: how significant effort is needed to maintain such list.


Is blocking known unwanted sites sufficient? It will block most of attempts but someone malicious and motivated will succeed.

Or is it necessary to ensure that nothing will pass?

Even with filtering domains it is still possible to add links like https://www.bbc.com/serious_insults_here leading to nonexisting page but containing unwanted text in link itself.

Allowing all link to facebook would allow some malicious - and classifying all facebook pages into OK ones and malicious would be a full time job for hundreds of people.

Note that sufficiently malicious people will manage to stuff offensive things in the year form (by setting year to 1488, see https://en.wikipedia.org/wiki/Fourteen_Words ) or by other content offensive by context (say setting year on building of a Polish embassy to 1939).

So it is not possible to block malicious content completely and it is question how much effort should be put into that. And in the end human moderation is necessary and not really avoidable. After all someone may add garbage data without making it offensive.

polly64 commented 6 months ago

@polly64 to check