We will do a couple of things for CSAM detection in instruction and output, so this issue is interalted to the embedding detectors we are building for safety.
keyword of porn words and child words calssifier, and compute ratio. Use to detect instructions. we need to create our own pipeline or figure out a hook that connects directly into HF tokenizer. this task is hard b/c instructions are quite short.
same keyword calssifier for model output but have different parameters in case the model. this is easier than (1) because the output will be longer. We either filter out with our own prompt (fed back into model to generate using prefixing, or we just block any output at all form being returned by the pipeline. Similar to above, we need to figure how if we do our own pipeline, or hook up to HF generate.
We use the above methods to read data we get in (either in a dataset, or potentially through a UI used to gather human data).
We will do a couple of things for CSAM detection in instruction and output, so this issue is interalted to the embedding detectors we are building for safety.