An Open Ethics evaluation dataset for Open-Assistant

sbmaruf commented 1 year ago

A lot of people will interact with OA. One main objective would be to keep the bot away from many biases that may originate from the base model. However, there are not many ethics-related datasets let alone a systematic evaluation.

Generating a systematic evaluation on an ethics-related dataset would be very difficult since ethics & values are totally different in many parts of the world. A good practical example would be the current "Football World Cup". People from all parts of the world join to celebrate football but still, there were cultural differences (like LGBTQ beliefs between the Middle Eastern vs Western cultures). Now when you train your base model with text from WOKE culture your model is subject to that bias. The current system of training framework (SGD variant optimization algos) cannot avoid these features.

So planning a systematic evaluation would require a large community effort. Here's a tentative proposal of how we should attempt to solve this,

Building a systematic data pipeline: This is the hardest part that we won't be able to automate. We need to scrape through literature and find "thought experiments" (like "Trolley Problem") and integrate them into the dataset. This should be the systematic approach. Crowdsourcing would be much more difficult because ethics and philosophy are different for different people. We need an actual domain expert to categorize different concepts of philosophy and Ethics. We shouldn't randomly add any evaluation just because it feels like correct to our own ethics. Like a simple question, do you want your chatbot to follow "Utilitarian morality" or "Deontological Morality"? I know building something like this would be much more difficult in the first iteration, but at least starting a pipeline would be great.
Evaluation: Doing automatic evaluation on ethics & philosophy based question would not be possible. This can be crowd-sourced and a lot of people can contribute to this. I would deeply recommend not to automate the evaluation, rather always perform a human evaluation.
Training Pipeline to remove the found biases: As we find new biases, we need a faster approach to train the model (prompt training/prefix training/full model training etc.) to remove the biases from the base model. I think planning ahead for this feature would save a lot of time & compute down to the line.
Interpretability Layer: I think this is the hardest part. Finding the reason why the chatbot is generating such text would be really good (i.e., https://www.perplexity.ai/). I think this is a fundamental feature that would be a requirement for any chatbot not strictly related to Ethics. Fundamentally, successful integration of the interoperability layer would change the landscape for Ethics and Lincencing issues in the language model.

Personal Note: I'm by no means a student of "Ethics and Philosophy". If you are interested, I would recommend following this course, https://www.youtube.com/watch?v=kBdfcR-8hEY Stanford also has some good resource here, https://stanford-cs324.github.io/winter2022/lectures/harms-1/ I'm here to learn and possibly facilitate creating the dataset. I would really appreciate it if particular domain experts join in the discussion.

** Creating this issue after discussing the stuff with @ontocord . Hope this helps the community.

huu4ontocord commented 1 year ago

I think this is a really interesting dataset. I wonder if ethics dialog would help align a model or not. see https://arxiv.org/pdf/2110.07574.pdf

sbmaruf commented 1 year ago

I think this is a really interesting dataset. I wonder if ethics dialog would help align a model or not. see https://arxiv.org/pdf/2110.07574.pdf

Definitely one of the main read for the discussion. I will take a look into it.

huu4ontocord commented 1 year ago

there was critcism that just because you trained on ethics data, doesn't mean the model is actually ethical ... ha ha... it could just be able to infer from facts to similar queries. but doesn't mean the internal weights are aligned with human values.

we should try MEMIT too to perform alignment.

andreaskoepf commented 1 year ago

@sbmaruf I think you bring up an interesting point. Different cultures have different value systems that are incompatible at their core, (e.g. people are literally fighting wars to protect their value system or to spread their culture, like religions, political-systems etc.). A realistic solution/approach that I see is to have multiple assistant-models, trained/fine-tuned on different data that is compatible with the cultural value-system of the target audience.

sbmaruf commented 1 year ago

Hi! @andreaskoepf I agree that these topics can be as severe as you have commented. But I think we are not aiming toward that. Following up on your comment, I feel like an Interpretability layer can be a good starting point to debug the model (also a defense against the anti-lm community). In that case, when your model produces a generation that people don't like, we can have clear interpretability of why that happened. At it's core, the simple interpretability for a starter could be building a citation mechanism.

FruityWelsh commented 1 year ago

@sbmaruf I think a citation mechanism is a next major milestone for collective intelligence systems period, as they will help transition from possibly knowledge opinions to more verifiable ones. Over coming the confidently wrong issue they seem to face now.

On this topic though, I couldn't agree more that a good interpretability model is a must to start with, otherwise in contested spaces of ethics we will just end up with people attempting to encode their biases into the model in attempts to remove what they see as biases. With interpretability, we can ask more questions and possible avoid surface level biases about biases.

SimonBiggs commented 1 year ago

In his book Human Compatible (https://www.penguinrandomhouse.com/books/566677/human-compatible-by-stuart-russell/) Stuart proposes interesting potential solutions to some of these issues.

johnandersen777 commented 1 year ago

We've been looking at AI ethics with a similar project, Alice, the Open Architecture: https://github.com/w3c/cogai/pull/47

The approach we're taking is to leverage data flow based plugins so that end-users can overlay their own "ethics" (whatever that might mean to them) onto upstream flows. The hope is, this combined with a review system facilitated by software vulnerability semantics as a backbone will enable end-users to see the downstream effects their ethical overlays have on the fulfilment of their requests.

Related
- https://mailarchive.ietf.org/arch/msg/scitt/sVaDAFfMSB7X_jjEBCZ1xt7vZJE/
- We additionally want to be able to do this without invalidating future builds once things are back under control.

huu4ontocord commented 1 year ago

Thank you @pdxjohnny - intersted in helping with our safety pipeline?

huu4ontocord commented 1 year ago

Really nice: https://github.com/intel/dffml/tree/alice/docs/tutorials/rolling_alice/0000_architecting_alice

huu4ontocord commented 1 year ago

@sbmaruf ping

RishabhMaheshwary commented 1 year ago

@ontocord @sbmaruf This is a benchmark dataset proposed to align AI systems with moral values. I am not sure if this is already being used. If anyone is not working on safety pipeline I can help.

RishabhMaheshwary commented 1 year ago

There is also this subreddit. It contains moral dilemmas and people vote whether the final action taken was morally correct or not. I am not sure if this is being already used, or it aligns with the ethics evaluation being discussed here, but it might be useful.

andreaskoepf commented 1 year ago

Any progress here?

sbmaruf commented 1 year ago

Hi @andreaskoepf I've reviewed a few ethics-related benchmarks,

https://github.com/hendrycks/ethics/
Another is in https://openreview.net/forum?id=U20Vvm1oJh

Also to the best of my understanding, the evaluation needs to be done by a human. More about this in the Figure 2, paper https://openreview.net/forum?id=U20Vvm1oJh

sbmaruf commented 1 year ago

Here is our latest paper, https://twitter.com/sbmaruf/status/1664965734831738881 Let us know how do you want to proceed. If there are one of two exclusive models, We (authors of this paper) can do the evaluation. @andreaskoepf

LAION-AI / Open-Assistant

An Open Ethics evaluation dataset for Open-Assistant #883