HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format
https://labelstud.io
Apache License 2.0
18.24k stars 2.29k forks source link

Local File Serving Not Working in Docker Container Despite Correct Environment Variables #6087

Open bsc001 opened 2 months ago

bsc001 commented 2 months ago

Describe the bug I'm running Label Studio inside a Docker container using docker-compose. I've set up environment variables to access data from local files (linked to a volume). The files exist when checking within the container, but I cannot access them through URLs from the browser or within the container.

To Reproduce Steps to reproduce the behavior:

  1. Create a docker-compose.yml with the following Label Studio service configuration:
labelstudio:
  image: heartexlabs/label-studio:latest
  ports:
    - "4999:8000"
  depends_on:
    - database
  environment:
    - LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true
    - LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=/label-studio/data
    - LABEL_STUDIO_BASE_DATA_DIR=/label-studio/data/
    - LABEL_STUDIO_CORS_ORIGIN=*
    - LOG_LEVEL=DEBUG
  volumes:
    - label_studio_mydata:/label-studio/data:rw
    - documents_dataset:/label-studio/data/raw_datasets/documents_dataset:rw
  command: label-studio-uwsgi
  1. Start the Label Studio container.

  2. Inside the container, create a test file:

echo "this is fake image" > /label-studio/data/raw_datasets/documents_dataset/document1/document1_Page_01.jpg 2.Attempt to access the file via browser: http://localhost:4999/data/local-files/?d=raw_datasets/documents_dataset/document1/document1_Page_01.jpg 2.Attempt to access the file from within the container: Copy curl -v 'http://localhost:8000/data/local-files/?d=raw_datasets/documents_dataset/document1/document1_Page_01.jpg'

Expected behavior The file should be accessible via the provided URLs. Actual behavior

Browser access fails Curl command from within the container fails to retrieve the file

Environment information:

Label Studio Version: Label Studio version: 1.12.1

Additional context

Environment variables are correctly set within the container: LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=/label-studio/data LABEL_STUDIO_BASE_DATA_DIR=/label-studio/data/

The files are present and accessible within the container when checked directly. Unable to find Label Studio configuration file (/label-studio/data/label_studio_config.json) within the container. Unable to locate or access Label Studio log files within the container.

Attempted troubleshooting:

Verified file permissions within the container Checked environment variables Attempted to access a simple text file using curl within the container (failed)

Any assistance in resolving this issue would be greatly appreciated.

bsc001 commented 2 months ago

Issue Summary and Temporary Solution, issue not well understood

Update:

The issue originates from the function localfiles_data in label_studio\core\views.py.

Original Code:

def localfiles_data(request):
    """Serving files for LocalFilesImportStorage"""
    user = request.user
    path = request.GET.get('d')
    if settings.LOCAL_FILES_SERVING_ENABLED is False:
        return HttpResponseForbidden(
            "Serving local files can be dangerous, so it's disabled by default. "
            'You can enable it with LOCAL_FILES_SERVING_ENABLED environment variable, '
            'please check docs: https://labelstud.io/guide/storage.html#Local-storage'
        )
    local_serving_document_root = settings.LOCAL_FILES_DOCUMENT_ROOT
    if path and request.user.is_authenticated:
        path = posixpath.normpath(path).lstrip('/')
        full_path = Path(safe_join(local_serving_document_root, path))
        user_has_permissions = False

        # Try to find Local File Storage connection based prefix:
        # storage.path=/home/user, full_path=/home/user/a/b/c/1.jpg =>
        # full_path.startswith(path) => True

        localfiles_storage = LocalFilesImportStorage.objects.annotate(
            _full_path=Value(os.path.dirname(full_path), output_field=CharField())
        ).filter(_full_path__startswith=F('path'))

        if localfiles_storage.exists():
            user_has_permissions = any(storage.project.has_permission(user) for storage in localfiles_storage)

        if user_has_permissions and os.path.exists(full_path):
            content_type, encoding = mimetypes.guess_type(str(full_path))
            content_type = content_type or 'application/octet-stream'
            return RangedFileResponse(request, open(full_path, mode='rb'), content_type)
        else:
            return HttpResponseNotFound()

    return HttpResponseForbidden()

Problem:

The localfiles_storage.exists() method returns False on the remote host, which causes user_has_permissions to remain False. Consequently, the function returns a 404 response. This behavior differs from the local environment where localfiles_storage.exists() returns True.

Resolution:

As a temporary solution, I set the default value of user_has_permissions to True. This allows the function to check for the file and send it back correctly.

Updated Code:

def localfiles_data(request):
    """Serving files for LocalFilesImportStorage"""
    user = request.user
    path = request.GET.get('d')
    if settings.LOCAL_FILES_SERVING_ENABLED is False:
        return HttpResponseForbidden(
            "Serving local files can be dangerous, so it's disabled by default. "
            'You can enable it with LOCAL_FILES_SERVING_ENABLED environment variable, '
            'please check docs: https://labelstud.io/guide/storage.html#Local-storage'
        )
    local_serving_document_root = settings.LOCAL_FILES_DOCUMENT_ROOT
    if path and request.user.is_authenticated:
        path = posixpath.normpath(path).lstrip('/')
        full_path = Path(safe_join(local_serving_document_root, path))
        user_has_permissions = True  # Temporary solution

        # Try to find Local File Storage connection based prefix:
        localfiles_storage = LocalFilesImportStorage.objects.annotate(
            _full_path=Value(os.path.dirname(full_path), output_field=CharField())
        ).filter(_full_path__startswith=F('path'))

        if localfiles_storage.exists():
            user_has_permissions = any(storage.project.has_permission(user) for storage in localfiles_storage)

        if user_has_permissions and os.path.exists(full_path):
            content_type, encoding = mimetypes.guess_type(str(full_path))
            content_type = content_type or 'application/octet-stream'
            return RangedFileResponse(request, open(full_path, mode='rb'), content_type)
        else:
            return HttpResponseNotFound()

    return HttpResponseForbidden()

Notes:

jombooth commented 1 month ago

Hi @bsc001 - did you connect an import storage of the Local Files type to a project you're working on, as shown in the screenshot below? That connection is what will make localfiles_storage.exists() return True; really it's just looking for a LocalFilesImportStorage that your user has access to, and with the appropriate path field on the local storage object.

Screenshot from 2024-08-01 00-18-37

bsc001 commented 1 month ago

Yes i attached a folder there, but why it is not returing the list of files there ? ..