Conflating poor performance with structural bias

daviddao / awful-ai

😈Awful AI is a curated list to track current scary usages of AI - hoping to raise awareness

https://twitter.com/dwddao

6.98k stars 233 forks source link

Conflating poor performance with structural bias #8

Open chobeat opened 6 years ago

chobeat commented 6 years ago

In the list I see examples like Tay that are mixed together with systems that have problems on a design and structural level (PredPol for example). Tay was just a bad marketing idea, probably from some manager very confused about how the internet works and what are the results of letting it influence your product/model without proper counter-measures. A lot of the articles on that event were fear-mongering against AI as a whole and I think this list should serve better its purpose by staying clean of non-news like the one related to Tay and actually focus on problematic usages of machine lerning systems.

thomaid commented 6 years ago

I agree with this - there is a difference between systems that have been designed for malign purposes and those that have weaknesses that can be exploited, or are subject to unintended consequences. Both Tay and the Google image classification algorithm are examples of the latter - their designers were not aware of the shortcomings of the model or learning system that would generate these outcomes (neither the designers of Tay nor the Google folk set out to create discrimination). So it may be worth making this distinction.

daviddao commented 6 years ago

That's a very good point regarding Tay! I'm not quite sure, however, if this distinction is really that easy to make in many other cases of the list (intended and non-intended). HireVue, for example, claims to prevent algorithmic bias and the company's stated goal is to remove human biases from the hiring process (also PredPol claims similar things). However, algorithmic bias is not completely preventable, for example, a better performance wouldn't have solved Google's auto tag system. So it really depends what the goal of this list should be. Personally, I have two possible goals in mind:

A collection of possible intended and unintended anti-patterns of AI usage (that would include Tay)
A list of AI systems that is intentionally used to harm a group of people (Tay also harmed people with her remarks but was definitely unintended. However, in many use cases, intention is hard to define)

Feel free to comment and criticise!

chobeat commented 6 years ago

I agree that it's hard to come up with hard definitions here. At the same time there should be a minimum technical and organizational complexity to make it to the list: any random guy on the internet can come up with a conversational model running in a twitter bot that insults random people according to their race, sexual preferences and so on. Should we care about it? Probably not, if it's really just an idiot on the internet.

If it's a released product of a company that has an economical interest in it, then probably we should care because the system has potential to spread and scale, together with its impact. The public should be aware of this and the list would help funneling this information. Does Tay falls into this category? I don't think so: it's a failed PR stunt from some R&D team in Microsoft that immediately pulled the bot off the internet and republished it weeks after with a better profanity filter.

daviddao commented 6 years ago

I like your point but I'm not sure if we should remove Tay on the premise of a "failed PR stunt". I think Microsoft itself has a [huge economic interest in the development] of experiments such as Tay and ultimately the underlying social chatbot technology in general (as shown by Zo and large-scale bots in China and Japan) - but sure, it's hard to argue about intention as none of us worked in Microsoft (except @thomaid?) and that is why I think we should not rely on this as argument.

Tay, without any doubt, had impact in how we think about AI accountability. It is a very shiny and clear example of the dangers of sexist and racist chatbot technology. It helped to raise awareness in accountability within Microsoft and others.

nukeop commented 6 years ago

Not only poor performance - I don't think programs that reveal uncomfortable truth should be included at all. A program that predicts the risk of recidivism isn't biased against blacks, it's just that they have a higher risk of recidivism statistically, so the program reveals that. A program that tests people crossing the EU border is predicted to have an 85% accuracy rate - why does the page state that it's "likely going to have a high number of false positives"? And so on. Cambridge Analytica is includedc in the list despite it being just one of many customers of Facebook - why isn't Facebook itself rated as the primary danger? Why isn't Google's vast spying network not mentioned anywhere? Uber's map isn't really AI and isn't really anything special, it simply shows the data that their customers knowingly and willingly give to them. Any such list that does not have clearly defined criteria will include so many entries that don't make sense as to be more or less meaningless.

daviddao commented 6 years ago

@nukeop I strongly disagree! As someone who works in AI & statistics for a while, a training distribution is in general never equal to the true distribution. You are calculating statistics not on the real world but what is experienced through a human-biased collector. And what's worse, in my opinion, is that such a training distribution, in fact, can be collected in extremely socially unjust and discriminating ways. A model which is then trained on a discriminating training set will reflect this bias (when not properly taken care of).

The underlying motivation of this awful list is following: A model is not an omniscient oracle which we should blindly trust - but simply a mere reflection of data collected by imperfect humans. That's why we need to fix algorithmic discrimination like we have to fix social discrimination. Someone building and working with an AI application that scales and influences possibly millions of people, such as predictive policing, should always keep that in mind. Discriminating models can influence not only behaviour but also the world view of people working with it. Policemen might mistakenly interpret discriminating AI predictions as truths. This is dangerous.

daviddao commented 6 years ago

However I agree with you that we need a set of formal community-curated criteria for being listed in the aweful-ai list and propose to open an issue for this -> #17