HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format
https://labelstud.io
Apache License 2.0
18.79k stars 2.34k forks source link

Data Import Error #6492

Open richb-rv opened 1 day ago

richb-rv commented 1 day ago

Describe the bug We get an incorrect formatting error when attempting to import new data.

Validation error
Error at item 0: "llm.inputs.retrieved_context" key is expected in task data [assume: item["data"] = task root with values] :: {'data': {'observability.identifiers.user': [{'key': 'session_id', 'value': ''}], 'observability.identifiers.system': [{'key': 'correlation_id', 'value': ''}, {'key': 'trace_id', 'value': ''}, {'key': 'parent_span_id', 'value': ''}], 'observability.identifiers.llm': [{'key': 'interaction_id', 'value': ''}, {'key': 'runnable_sequence_id', 'value': ''}, {'key': 'runnable_sequence_step', 'value': ''}, {'key': 'runnable_id', 'value': ''}], 'llm.inputs.retrieved_context': [{'id': '1', 'title': '', 'body': ''}, {'id': '2', 'title': '', 'body': ''}], 'llm.outputs': [{'key': 'text_response', 'value': ''}]}, 'file_upload_id': 28}

I believe this error is telling me that the key llm.inputs.retrieved_context defined in my interface is not present in the data being uploaded, however it is there.

If we import a data file, then add the interface it works fine, but if the interface is already existing we get the error message.

To Reproduce

Example Interface:

<View>
  <Style> .lsf-select { display: none; } </Style>
  <List name="retrieved-context" value="$llm.inputs.retrieved_context" title="Retrieved Context" />
  <header>LLM Outputs:</header>
  <Paragraphs name="llm-outputs" nameKey="key" textkey="value" value="$llm.outputs" layout="dialogue" />
  <Choices name="sentiment" toName="llm-outputs" choice="single" showInLine="true">
   <Choice value="ambiguous"/>
   <Choice value="factually accurate"/>
   <Choice value="factually inaccurate"/>
  </Choices>
</View>

example data: fa-test.json

Steps to reproduce the behavior:

  1. Create a new project
  2. Add Label Interface
  3. Try to Import the data file

Expected behavior Data file is uploaded and rendered through the interface

Screenshots With data input directly into the labeling interface configuration:

Screenshot 2024-10-09 at 9 35 12 AM

When data is uploaded prior to setting up the labeling interface:

Screenshot 2024-10-09 at 9 35 28 AM

When attempting to import data as a file after labeling interface is saved:

Screenshot 2024-10-09 at 9 35 46 AM

Environment (please complete the following information):

Additional context The same example data works if input as data in the labeling interface preview The same example data also renders correctly in the UI if you:

  1. Create a new project
  2. Upload the example data file FIRST
  3. Create the labeling interface
AbubakarSaad commented 1 day ago

Hello Rich,

Its because the way data is structure. If you have llm.inputs.retrieved_context then it would mean the strucuture is something similar to this: "llm": { "inputs": { "retrieved_context": [...] }, But if you just remove llm.inputs and name it as "retrieved_context" it works.

Screenshot 2024-10-09 at 2 46 07 PM Screenshot 2024-10-09 at 2 46 21 PM
richb-rv commented 1 day ago

Hmm okay interesting, So I'm not able to target nested items using dot notation; for instance with your example:

"llm": {
"inputs": {
"retrieved_context": [...]
}
},

using dot notation like llm.inputs.retrieved_context does not actually target retrieved_context (This is the reason we actually flattened that data, and created the key the way we did)

however I did realize that it was the . causing the issue; it seems that you can't use any special characters as separators in the key name, for example something like: llm:inputs:retrieved_context

Are both of those statements accurate?

richb-rv commented 1 day ago

Hey @AbubakarSaad So I did some more digging here, I think there's a couple of bugs, the main one being: It appears that I can nest data, but I can't do that for example data when creating the labeling interface It seems that there is some difference in how JSON is parsed between the labeling interface preview, the UI file import feature, and the Importing tasks via API.

Thank you!