HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format
https://labelstud.io
Apache License 2.0
19.55k stars 2.42k forks source link

Error when importing CSV with Jetty #2447

Open FancyBanana opened 2 years ago

FancyBanana commented 2 years ago

Describe the bug When my spring-boot application tries to POST a csv file to /api/projects/{id}/import/ I get following error:

{"id":"f004a96d-aaab-4313-85c8-0d8de6864dea","status_code":400,"version":"1.4.1post1","detail":"Validation error","exc_info":null,"validation_errors":{"non_field_errors":["load_tasks: No data found in DATA or in FILES"]}}

To Reproduce Steps to reproduce the behavior:

** content start ** --J3uPyGWi69NLJPEXwEIKTBbLe6iu-tZyQe3rp Content-Disposition: form-data; name="file"; filename="data.csv" Content-Type: application/octet-stream Content-Length: 1644

"Id métier","Id entrant","MIME Type","Path" "1","0","text/plain","élément 1 : alten sud-ouest élément 2 : alten sud ouest" "2","1","text/plain","élément 1 : airbus operations s.a.s élément 2 : airbus operations sas" "3","2","text/plain","élément 1 : airbus eydt élément 2 : airbus - eydt" "4","3","text/plain","élément 1 : psa peugeot citroen élément 2 : psa peugeot-citroen" "5","4","text/plain","élément 1 : psa peugeot citroen élément 2 : psa - peugeot citroen" "6","5","text/plain","élément 1 : psa peugeot citroen élément 2 : p.s.a. peugeot citroen" "7","6","text/plain","élément 1 : dassault aviation élément 2 : dassault-aviation" "8","7","text/plain","élément 1 : cea leti élément 2 : cea-leti"

content end


* Response:

COMM RESPONSE LOG Status HTTP/1.1 400 Bad Request Headers: Access-Control-Allow-Origin: * Access-Control-Allow-Methods: GET,POST,PUT,PATCH,DELETE,HEAD,OPTIONS Access-Control-Allow-Headers: Content-Type, Origin, Accept, Authorization, Content-Length, X-Requested-With Date: Fri, 03 Jun 2022 09:33:33 GMT Server: WSGIServer/0.2 CPython/3.8.10 Content-Type: application/json Allow: POST, OPTIONS Content-Length: 221 x-frame-options: DENY Vary: Accept-Language, Cookie, Origin Content-Language: en-us x-content-type-options: nosniff referrer-policy: same-origin Set-Cookie: sessionid=eyJ1aWQiOiI1ZTg1ODE4MS05NWU0LTQzYTItOGRlOC1iNDU5Y2RmZjk1OTkiLCJvcmdhbml6YXRpb25fcGsiOjF9:1nx3gT:nZaWF1Ppv0a6bMqjkvUt_ipV-ZLSEs-n5vvS721R2BM; expires=Fri, 17 Jun 2022 09:33:33 GMT; HttpOnly; Max-Age=1209600; Path=/; SameSite=Lax Connection: keep-alive

** content start ** {"id":"f004a96d-aaab-4313-85c8-0d8de6864dea","status_code":400,"version":"1.4.1post1","detail":"Validation error","exc_info":null,"validation_errors":{"non_field_errors":["load_tasks: No data found in DATA or in FILES"]}} content end


**Expected behavior**
File gets imported as tasks.
`"POST /api/projects/3/import HTTP/1.1" 201 229`

**Screenshots**
Diff between Failed Jetty request body (left) and working Curl request body (right)
![image](https://user-images.githubusercontent.com/7935263/171830817-c3814f5a-3cb9-4b07-a317-e6cce0e301d5.png)

Curl request:
![image](https://user-images.githubusercontent.com/7935263/171831046-0e8fe5f7-a9c4-4003-b568-146be55cfbf8.png)

Jetty Request:
![image](https://user-images.githubusercontent.com/7935263/171831476-367aa0a3-d6da-4d33-b64b-a27fdd6d7108.png)

Curl Request:
![image](https://user-images.githubusercontent.com/7935263/171831562-43ee94e1-8a78-49b7-ac15-75d9f0182534.png)

**Environment (please complete the following information):**
 - OS: Windows Host, label-studio running in Docker container in WSL2
 - Label Studio Version: 1.4.1post1

{ "release": "1.4.1post1", "label-studio-os-package": { "version": "1.4.1post1", "short_version": "1.4", "latest_version_from_pypi": "1.4.1.post1", "latest_version_upload_time": "2022-02-12T00:44:06", "current_version_is_outdated": false },

"label-studio-os-backend": { "message": "Merge Develop + LSE hotfix/2.2.7-hotfix.1: Return 404 for api/project/ ...", "commit": "3239a3d04e65c2cd0091568aa0439d103be2970c", "date": "2022-05-11 16:05:40 +0300", "branch": "master", "version": "3239a3d" },

"label-studio-frontend": { "message": "fix: DEV-2100: Fix preselected choices (#584) - working with empty an ...", "commit": "ee38e771760e1ce57ac62dc1556ddd0718f62487", "branch": "master", "date": "2022-04-27T11:55:50Z" },

"dm2": { "message": "Fix tasks selection (#44)", "commit": "97e33ac0a9b0ea09398b00d7916671ca76cf2a71", "branch": "master", "date": "2022-04-13T13:55:17Z" },

"label-studio-converter": { "version": "0.0.40" } }


**Additional Context**

Lable Studio console output from docker container:

[2022-06-03 09:33:33,002] [core.utils.common::custom_exception_handler::82] [ERROR] f004a96d-aaab-4313-85c8-0d8de6864dea [ErrorDetail(string='load_tasks: No data found in DATA or in FILES', code='invalid')]

Traceback (most recent call last):

File "/usr/local/lib/python3.8/dist-packages/rest_framework/views.py", line 506, in dispatch

response = handler(request, *args, **kwargs)

File "/usr/local/lib/python3.8/dist-packages/django/utils/decorators.py", line 43, in _wrapper

return bound_method(*args, **kwargs)

File "/label-studio/label_studio/data_import/api.py", line 176, in post

return super(ImportAPI, self).post(*args, **kwargs)

File "/usr/local/lib/python3.8/dist-packages/rest_framework/generics.py", line 190, in post

return self.create(request, *args, **kwargs)

File "/label-studio/label_studio/data_import/api.py", line 208, in create

parsed_data, file_upload_ids, could_be_tasks_lists, found_formats, data_columns = load_tasks(request, project)

File "/label-studio/label_studio/data_import/uploader.py", line 148, in load_tasks

raise ValidationError('load_tasks: No data found in DATA or in FILES')

rest_framework.exceptions.ValidationError: [ErrorDetail(string='load_tasks: No data found in DATA or in FILES', code='invalid')]

[2022-06-03 09:33:33,009] [django.request::log_response::224] [WARNING] Bad Request: /api/projects/3/import

[03/Jun/2022 09:33:33] "POST /api/projects/3/import HTTP/1.1" 400 221

FancyBanana commented 2 years ago

After simulating the request through Insomnia, I narrowed the error down to the Transfer-encoding: chunked header. If the header is set server will always respond with an error

triklozoid commented 2 years ago

Thanks for the report!

After simulating the request through Insomnia, I narrowed the error down to the Transfer-encoding: chunked header. If the header is set server will always respond with an error

I'm not familiar with Jetty, can you control headers and just exclude this one from request as a workaround?

FancyBanana commented 2 years ago

Transfer-encoding is set automatically by WebClient class from Spring Framework, and as far as I know there's isn't any way to remove it other than setting Content-Length header manually, but that would require to manually serialize multipart form before sending the request. My workaround was to regenerate the OpenAPI Client using RestTemplate instead of WebClient.