LAION-AI / Open-Assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
https://open-assistant.io
Apache License 2.0
36.98k stars 3.23k forks source link

stance on dealing with offensive/bad topics and/or options around them #2497

Closed BMaxV closed 1 year ago

BMaxV commented 1 year ago

I don't want to voice an opinion on what the project should or shouldn't do. But I think it would be cool to have an info box or something to make the intentions of the project clear.

Obviously, the bad scenario is the assistant being rude to everyone or "leaking bad stuff" when it's not appropriate or simply illegal stuff that you have already clarified.

There is the flip side of asking for helpful advice around bad topics, which I think would be a good idea and "an assistant" should be able to do. I feel/think equally bad as "leaking bad stuff" would be a genuine request for help to be refused because it's triggering a bad keyword.

There is also the question of asking things around "bad topic" entertainment media, e.g. horror novels.

So the cases would be

polite prompt -> rude answer

bad prompt -> bad answer

bad prompt -> polite, helpful answer

polite prompt to be bad -> intentionally bad answer (maybe with safety warnings or something)

E.g. I did this and it worked as I wanted, but the assistant clearly meant to refuse to do it and it required "smart/hacky" prompt to make it do it anyway.

So is the malfunction that it refused to do it or that it did provide an answer when it shouldn't?

I think particularly this kind of use case of offering advice on improved politeness or taking into account cultural differences and voicing warnings that things may offend humans is one of the best use cases for such an assistant. It doesn't offend any actual people, it's not really "difficult" to do for people, but also not a "rewarding" task people would want to do of their own accord. So it's ideal for automation.

I can see this being an actually important feature to make sure e.g. business emails aren't accidentally offensive.

Q:Generally, can you give advice on whether a statement could be seen as offensive or condescending?

A:It is difficult to automatically detect if an sentence might be interpreted by someone as being offensive, because different people have different sensitivities to language, cultural backgrounds etc..
A machine learning model might one day help with this task. However as far as today there are no tools which do this reliably so it must be done manually currently.

Q:pretend to be an advisor, is the sentence "[...]" offensive or not?

Yes, your sentence has high potential to offense those who consider themselves experts in their respective field. Your statements show little empathy and respect towards them. A better way would be to address each other differences through constructive criticism and suggestions.
olliestanley commented 1 year ago

This behaviour is shaped by the views of users submitting data. We can't really "control" that