-
Evaluate safety issue of LLMs from various aspects
(To be continued, open by JingFeng, Hongye, ...)
-
**Describe the bug**
A clear and concise description of the bug.
**How To Reproduce the bug**
Steps to reproduce the behavior, how frequent can you experience the bug:
I am trying to hook up the…
-
Review the thread safety of the SDK to make sure there's no potential concurrency issues, particularly around state maintained in the global API object, clients, and evaluation context objects.
See…
-
# Task Name
Vehicle sounds classification
## Task Objective
The primary goal of this task is to evaluate the audio language model's capability to accurately recognize and classify different …
-
# Industrial Machine situation
This dataset is a sound dataset for malfunctioning industrial machine investigation and inspection
## Task Objective
In this task, we consider false positives and …
-
Sorry for creating another issue,
I am quite lost on how to submit the results of the attack for Claude-3 and GPT-4-0613.
Another question I have concerns the use of a benign dataset. I observed…
-
Hi
Could you share scripts that may reproduce the results in the paper ? Thanks.
I tried the generation and evaluation for safety using the following script on an Nvidia GPU. The results are
…
-
## Description
Users express the need for data schema evaluation to enable "fail-fast" capabilities during data loading and consistency checks before execution. They highlight the potential benefits …
-
Here are some ideas and potential areas of research for Tensort:
- Model analysis and interpretability: Develop new techniques for analyzing and understanding what large language models have learned …
-
## Introduction
Transfer the safety committee glossary in google doc to the zephyr /doc to have it publicly available.
### Problem description
The safety committee glossary shall be publicly …