Data Import Error - Githubissues

richb-rv commented 1 month ago

Describe the bug We get an incorrect formatting error when attempting to import new data.

Validation error
Error at item 0: "llm.inputs.retrieved_context" key is expected in task data [assume: item["data"] = task root with values] :: {'data': {'observability.identifiers.user': [{'key': 'session_id', 'value': ''}], 'observability.identifiers.system': [{'key': 'correlation_id', 'value': ''}, {'key': 'trace_id', 'value': ''}, {'key': 'parent_span_id', 'value': ''}], 'observability.identifiers.llm': [{'key': 'interaction_id', 'value': ''}, {'key': 'runnable_sequence_id', 'value': ''}, {'key': 'runnable_sequence_step', 'value': ''}, {'key': 'runnable_id', 'value': ''}], 'llm.inputs.retrieved_context': [{'id': '1', 'title': '', 'body': ''}, {'id': '2', 'title': '', 'body': ''}], 'llm.outputs': [{'key': 'text_response', 'value': ''}]}, 'file_upload_id': 28}

I believe this error is telling me that the key llm.inputs.retrieved_context defined in my interface is not present in the data being uploaded, however it is there.

If we import a data file, then add the interface it works fine, but if the interface is already existing we get the error message.

To Reproduce

Example Interface:

<View>
  <Style> .lsf-select { display: none; } </Style>
  <List name="retrieved-context" value="$llm.inputs.retrieved_context" title="Retrieved Context" />
  <header>LLM Outputs:</header>
  <Paragraphs name="llm-outputs" nameKey="key" textkey="value" value="$llm.outputs" layout="dialogue" />
  <Choices name="sentiment" toName="llm-outputs" choice="single" showInLine="true">
   <Choice value="ambiguous"/>
   <Choice value="factually accurate"/>
   <Choice value="factually inaccurate"/>
  </Choices>
</View>

example data: fa-test.json

Steps to reproduce the behavior:

Create a new project
Add Label Interface
Try to Import the data file

Expected behavior Data file is uploaded and rendered through the interface

Screenshots With data input directly into the labeling interface configuration:

When data is uploaded prior to setting up the labeling interface:

When attempting to import data as a file after labeling interface is saved:

Environment (please complete the following information):

OS: Mac OS Sonoma 14.5
Label Studio Version [e.g. 1.13.1]

Additional context The same example data works if input as data in the labeling interface preview The same example data also renders correctly in the UI if you:

Create a new project
Upload the example data file FIRST
Create the labeling interface

AbubakarSaad commented 1 month ago

Hello Rich,

Its because the way data is structure. If you have llm.inputs.retrieved_context then it would mean the strucuture is something similar to this: "llm": { "inputs": { "retrieved_context": [...] }, But if you just remove llm.inputs and name it as "retrieved_context" it works.

richb-rv commented 1 month ago

Hmm okay interesting, So I'm not able to target nested items using dot notation; for instance with your example:

"llm": {
"inputs": {
"retrieved_context": [...]
}
},

using dot notation like llm.inputs.retrieved_context does not actually target retrieved_context (This is the reason we actually flattened that data, and created the key the way we did)

however I did realize that it was the . causing the issue; it seems that you can't use any special characters as separators in the key name, for example something like: llm:inputs:retrieved_context

Are both of those statements accurate?

richb-rv commented 1 month ago

Hey @AbubakarSaad So I did some more digging here, I think there's a couple of bugs, the main one being: It appears that I can nest data, but I can't do that for example data when creating the labeling interface It seems that there is some difference in how JSON is parsed between the labeling interface preview, the UI file import feature, and the Importing tasks via API.

Thank you!

heidi-humansignal commented 1 month ago

Hello,

Let me do some testing, there shouldn't be much difference with how JSON is being parsed. LS does endup using API endpoint to show in the UI.

Thank you, Abu

Comment by Abubakar Saad Workflow Run

HumanSignal / label-studio

Data Import Error #6492