Closed SummerSigh closed 11 months ago
Thank you for brining this important issue up @SummerSigh.
This is a hard issue and I admit I haven't thought of this very much. My main focus is to have people use our tool to enable them to do better work and help people learn.
From the educaiton perspective, I think cheating in school essays could be a problem. However, I don't know if we should build water marking upfront, or put in the API hooks to enable, say a school open source community to plug in such a tool in the generator. So those communities can decide for themselves.
As for false information, I know this is a real problem. But the danger is whatever solution we might want to devise to limit fake news by tracking (with a switch turned on by default for example, and people can turn it off if they are researching this kinda stuff), I also don't want to propogate tracking technologies. I don't want orgs or governments to be able to track people unless they follow the process set up in their jursidictions.
Another solution to limit fake news is to get our models to be more factual, which we are trying to do. But we also don't want to limit people from creating fiction for example. An altnerative history where Hitler won the war, and everyone in America lives under Nazi rule (See e.g., The Man in High Castle), would have news articles that are very different in tone than our current news articles, for example. I can imagine people could and should be able to use the OA to generate such fiction. Could that be used for fake news?
I look forward to the discussion here. Also, in our discussion, we should propose solutions that are actually implementable with tech we have now or can reasonably create. For example, I think watermarking could be done via some sort of decoder skewing that might not change the semantics much. But we would need to research this.
@ontocord here are some implementations I have tested and know work:
https://github.com/falcondai/lm-steganography https://github.com/ku-nlp/steganography-with-masked-lm https://github.com/mickeysjm/StegaText https://github.com/jumon/himitsu
I personally like steganography-with-masked-lm so I put it up in this Kaggle notebook:
https://www.kaggle.com/code/summerbreeze11/text-steganograpthy/edit
Thank you for sharing these links!
Following up on this if anyone is interested in exploring this more? Maybe create an API hook into the safety pipeline to steer decoding? And any org can put their own method/callback in there. This could be useful for privately run bots.
I’ll take a look at it further. I’ll put my updates here as I go.
This issue pertains to the ideas around watermarking Open Assistant generations via Linguistic Stenography. Linguistic Stenography is a field of active research however a few methods have emerged that make me confident that it can be used effectively in identifying Open Assistant generations. This issue isn't going to detail these methods but rather discuss the ethical debate on the side of using Linguistic Stenography.
In my opinion here are the main benefits:
However I also have the following concerns:
I think this is a subject of great discussion, and requires multiple view points so we can properly evaluate if this is something we should pursue further.