HumanSignal / label-studio-ml-backend

Configs and boilerplates for Label Studio's Machine Learning backend
Apache License 2.0
531 stars 239 forks source link

DisallowedRedirect: Unsafe redirect to URL with protocol 'data' #632

Open ClarkeAC opened 6 days ago

ClarkeAC commented 6 days ago

Version: 2.0.1dev0 OS: Linux 22.04 + docker compose

Getting the following error when using sam and sam2. Images in minio cause this error, directly uploaded images do not get this error.

segment_anything_model  | 2024-09-23T08:13:31.156896024Z Traceback (most recent call last):
segment_anything_model  | 2024-09-23T08:13:31.156904570Z   File "/usr/local/lib/python3.8/site-packages/label_studio_ml/exceptions.py", line 39, in exception_f
segment_anything_model  | 2024-09-23T08:13:31.156913990Z     return f(*args, **kwargs)
segment_anything_model  | 2024-09-23T08:13:31.156923240Z   File "/usr/local/lib/python3.8/site-packages/label_studio_ml/api.py", line 69, in _predict
segment_anything_model  | 2024-09-23T08:13:31.156956825Z     response = model.predict(tasks, context=context, **params)
segment_anything_model  | 2024-09-23T08:13:31.156967015Z   File "/app/model.py", line 51, in predict
segment_anything_model  | 2024-09-23T08:13:31.156975805Z     predictor_results = PREDICTOR.predict(
segment_anything_model  | 2024-09-23T08:13:31.156984261Z   File "/app/sam_predictor.py", line 202, in predict
segment_anything_model  | 2024-09-23T08:13:31.156995751Z     return self.predict_sam(img_path, point_coords, point_labels, input_box, task)
segment_anything_model  | 2024-09-23T08:13:31.157004941Z   File "/app/sam_predictor.py", line 173, in predict_sam
segment_anything_model  | 2024-09-23T08:13:31.157013748Z     self.set_image(img_path, calculate_embeddings=False, task=task)
segment_anything_model  | 2024-09-23T08:13:31.157022315Z   File "/app/sam_predictor.py", line 86, in set_image
segment_anything_model  | 2024-09-23T08:13:31.157030995Z     image_path = get_local_path(
segment_anything_model  | 2024-09-23T08:13:31.157039338Z   File "/usr/local/lib/python3.8/site-packages/label_studio_sdk/_extensions/label_studio_tools/core/utils/io.py", line 159, in get_local_path
segment_anything_model  | 2024-09-23T08:13:31.157048518Z     filepath = download_and_cache(
segment_anything_model  | 2024-09-23T08:13:31.157056888Z   File "/usr/local/lib/python3.8/site-packages/label_studio_sdk/_extensions/label_studio_tools/core/utils/io.py", line 209, in download_and_cache
segment_anything_model  | 2024-09-23T08:13:31.157066839Z     r.raise_for_status()
segment_anything_model  | 2024-09-23T08:13:31.157075359Z   File "/usr/local/lib/python3.8/site-packages/requests/models.py", line 1024, in raise_for_status
segment_anything_model  | 2024-09-23T08:13:31.157084373Z     raise HTTPError(http_error_msg, response=self)
segment_anything_model  | 2024-09-23T08:13:31.157093179Z requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://192.168.1.255:8080/tasks/63506/presign/?fileuri=s3://dataset/image/steam/00001.jpg
segment_anything_model  | 2024-09-23T08:13:31.157102403Z 
segment_anything_model  | 2024-09-23T08:13:31.157711422Z [2024-09-23 08:13:31,157] [DEBUG] [label_studio_ml.api::log_response_info::191] Response status: 500 INTERNAL SERVER ERROR
segment_anything_model  | 2024-09-23T08:13:31.157827964Z [2024-09-23 08:13:31,157] [DEBUG] [label_studio_ml.api::log_response_info::192] Response headers: Content-Type: application/json
segment_anything_model  | 2024-09-23T08:13:31.157855786Z Content-Length: 1700
segment_anything_model  | 2024-09-23T08:13:31.157865430Z 
segment_anything_model  | 2024-09-23T08:13:31.157870552Z 
segment_anything_model  | 2024-09-23T08:13:31.158034325Z [2024-09-23 08:13:31,157] [DEBUG] [label_studio_ml.api::log_response_info::193] Response body: b'{"detail":"HTTPError: 500 Server Error: Internal Server Error for url: http://192.168.1.255:8080/tasks/63506/presign/?fileuri=s3://dataset/image/steam/00001.jpg","request":{},"result":{"traceback":"Traceback (most recent call last):\\n  File \\"/usr/local/lib/python3.8/site-packages/label_studio_ml/exceptions.py\\", line 39, in exception_f\\n    return f(*args, **kwargs)\\n  File \\"/usr/local/lib/python3.8/site-packages/label_studio_ml/api.py\\", line 69, in _predict\\n    response = model.predict(tasks, context=context, **params)\\n  File \\"/app/model.py\\", line 51, in predict\\n    predictor_results = PREDICTOR.predict(\\n  File \\"/app/sam_predictor.py\\", line 202, in predict\\n    return self.predict_sam(img_path, point_coords, point_labels, input_box, task)\\n  File \\"/app/sam_predictor.py\\", line 173, in predict_sam\\n    self.set_image(img_path, calculate_embeddings=False, task=task)\\n  File \\"/app/sam_predictor.py\\", line 86, in set_image\\n    image_path = get_local_path(\\n  File \\"/usr/local/lib/python3.8/site-packages/label_studio_sdk/_extensions/label_studio_tools/core/utils/io.py\\", line 159, in get_local_path\\n    filepath = download_and_cache(\\n  File \\"/usr/local/lib/python3.8/site-packages/label_studio_sdk/_extensions/label_studio_tools/core/utils/io.py\\", line 209, in download_and_cache\\n    r.raise_for_status()\\n  File \\"/usr/local/lib/python3.8/site-packages/requests/models.py\\", line 1024, in raise_for_status\\n    raise HTTPError(http_error_msg, response=self)\\nrequests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://192.168.1.255:8080/tasks/63506/presign/?fileuri=s3://dataset/image/steam/00001.jpg\\n"},"status":500}\n'

Test with post man:

GET: http://192.168.1.255:8080/tasks/63506/presign/?fileuri=s3://dataset/image/steam/00001.jpg

Headers: Authorization:Token xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

{
    "id": "db10e9d9-18e6-4fa5-a6ae-4b3eca161daf",
    "status_code": 500,
    "version": "1.13.1",
    "detail": "Unsafe redirect to URL with protocol 'data'",
    "exc_info": "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.10/dist-packages/rest_framework/views.py\", line 506, in dispatch\n    response = handler(request, *args, **kwargs)\n  File \"/label-studio/label_studio/./data_import/api.py\", line 816, in get\n    return self.handle_presign(request, fileuri, task)\n  File \"/label-studio/label_studio/./data_import/api.py\", line 789, in handle_presign\n    response = HttpResponseRedirect(redirect_to=url, status=status.HTTP_303_SEE_OTHER)\n  File \"/usr/local/lib/python3.10/dist-packages/django/http/response.py\", line 506, in __init__\n    raise DisallowedRedirect(\"Unsafe redirect to URL with protocol '%s'\" % parsed.scheme)\ndjango.core.exceptions.DisallowedRedirect: Unsafe redirect to URL with protocol 'data'\n"
}

P.S.

Sam2 doesn't get 500 error straight away, it gets this first.

"Label Studio Task ID is required for cloud storage files"

After edit some code to passing the task id in, the 500 error occurs.

makseq commented 6 days ago

It sounds like your minio instance is configured incorrectly and doesn't support redirects. Check this schema to understand better how LS works with presigned urls from storages:

Can you see this image in label studio quickview?

ClarkeAC commented 4 days ago

Hi, @makseq, thanks for your information.

I can see it in the quickview. 1 2

show task source (minio, base64 part is too long so I removed them here):

{
  "id": 63506,
  "data": {
    "image": "data:application/octet-stream;base64,<encoded image data>"},
  "annotations": [
    {
      "id": 37179,
      "result": [],
      "created_username": " admin@admin.com, 1",
      "created_ago": "2&nbsp;days, 1&nbsp;hour",
      "completed_by": {
        "id": 1,
        "first_name": "",
        "last_name": "",
        "avatar": null,
        "email": "admin@admin.com",
        "initials": "ad"
      },
      "was_cancelled": false,
      "ground_truth": false,
      "created_at": "2024-09-23T06:49:18.736170Z",
      "updated_at": "2024-09-23T06:49:18.736205Z",
      "draft_created_at": "2024-09-23T06:30:45.061583Z",
      "lead_time": 127.01299999999999,
      "import_id": null,
      "last_action": null,
      "task": 63506,
      "project": 1,
      "updated_by": 1,
      "parent_prediction": null,
      "parent_annotation": null,
      "last_created_by": null
    }
  ],
  "predictions": []
}

show task source (directly upload):

{
  "id": 63510,
  "data": {
    "image": "/data/upload/1/98c429df-IMG_20231203_153031.jpg"
  },
  "annotations": [
    {
      "id": 37180,
      "result": [],
      "created_username": " admin@admin.com, 1",
      "created_ago": "2 days, 1 hour",
      "completed_by": {
        "id": 1,
        "first_name": "",
        "last_name": "",
        "avatar": null,
        "email": "admin@admin.com",
        "initials": "ad"
      },
      "was_cancelled": false,
      "ground_truth": false,
      "created_at": "2024-09-23T07:17:33.596554Z",
      "updated_at": "2024-09-23T07:17:33.596596Z",
      "draft_created_at": "2024-09-23T06:43:36.256970Z",
      "lead_time": 677.96,
      "import_id": null,
      "last_action": null,
      "task": 63510,
      "project": 1,
      "updated_by": 1,
      "parent_prediction": null,
      "parent_annotation": null,
      "last_created_by": null
    }
  ],
  "predictions": []
}

I also tried to use minio access key with full access policy. But get the same 500 error.

{
 "Version": "2012-10-17",
 "Statement": [
  {
   "Effect": "Allow",
   "Action": [
    "admin:*"
   ]
  },
  {
   "Effect": "Allow",
   "Action": [
    "kms:*"
   ]
  },
  {
   "Effect": "Allow",
   "Action": [
    "s3:*"
   ],
   "Resource": [
    "arn:aws:s3:::*"
   ]
  }
 ]
}