HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format
https://labelstud.io
Apache License 2.0
18.94k stars 2.36k forks source link

No more data available for labeling #650

Closed Yannik1337 closed 3 years ago

Yannik1337 commented 3 years ago

Describe the bug I can run LS with the ML backend fine. I did set the "sampling": "prediction-score-min"option in the config. Unfortunately, I can't seem to get LS sample the new data for me. It always says No more data available for labeling, which is strange since I have only manually labeled 8 files so far.

Expected behavior Sampling with the prediction score min

Screenshots That's the starting screen Bildschirmfoto 2021-03-15 um 11 25 42

And that's the screen I get when I click on Label, in the upper right corner: Bildschirmfoto 2021-03-15 um 11 30 31

Environment (please complete the following information):

Additional context I did create the project with this command: label-studio init my_project --input-path=/path/to/audio/directory/ --input-format=audio-dir --label-config=config.xml --allow-serving-local-files, and then manually changed the config.json to "sampling" : "prediction-score-min".

Running label-studio start my_project --sampling=prediction-score-min --ml-backends http://localhost:9090throws me an error saying that label-studio start: error: argument --sampling: invalid choice: 'prediction-score-min' (choose from 'sequential', 'uniform'), which is weird since you write in your documentation : "To enable task sampling, specify one of the sampling option with the --sampling=

This is the config.json file I am using:

{
  "title": "Label Studio",
  "description": "default",
  "protocol": "http://",
  "host": "0.0.0.0",
  "port": 8080,
  "debug": false,
  "label_config": "config.xml",
  "input_path": "tasks.json",
  "output_dir": "completions",
  "instruction": "<img src='static/images/ls_logo.png'><br> Type some <b>hypertext</b> for annotators here!<br> <a href='https://labelstud.io/guide/labeling.html'>Read more</a> about the labeling interface.",
  "allow_delete_completions": true,
  "templates_dir": "examples",
  "editor": {
    "debug": false
  },
  "sampling": "prediction-score-min",
  "enable_predictions_button": true,
  "ml_backends": [
    {
      "url": "http://0.0.0.0:9090/",
      "name": "my_project7c63"
    }
  ],
  "task_page_auto_update_timer": 10000,
  "show_project_links_in_multisession": true,
  "experimental_features": false,
  "first_page_full_render": true,
  "source": {
    "name": "Tasks",
    "type": "tasks-json",
    "path": "tasks.json"
  },
  "target": {
    "name": "Completions",
    "type": "completions-dir",
    "path": "completions"
  },
  "allow_serving_local_files": true
}
makseq commented 3 years ago

@Yannik1337 Could you check it in LS 1.0, I hope it's fixed there.

Yannik1337 commented 3 years ago

There's much that has changed in LS 1.0, it has a cleaner UI and is definitely more responsive. I however have not found a way to import the data by pointing LS to a directory, as I did previously with this command:

label-studio init my_project --input-path=/path/to/audio/directory/ --input-format=audio-dir --label-config=config.xml --allow-serving-local-files

How can I achieve this? I have thousands of files, which I thus can not import manually-- I found pointing LS to a directory quite comfortable to use!

niklub commented 3 years ago

Hi, @Yannik1337 ! --input-path pointed to the directory is deprecated. Starting from version 1.0.1, you have two options for serving local files:

  1. Run a separate server (e.g. with python -m http.server 8081 to get URLs pointed to your local files as http://localhost:8081/audio.wav. You can collect them manually and import the list from UI (also some helper scripts are available for that)
  2. Use Local files storage from cloud storage project settings
Yannik1337 commented 3 years ago

Hi @niklub , thanks for this hint-- I did not see that before! After working with the new LS 1.0.0 I have a few suggestions:

Yannik1337 commented 3 years ago

After importing all the tasks, I can view them, but not play the audio files: Bildschirmfoto 2021-04-02 um 09 58 51

Looking in the console, I find many 403s: Failed to load resource: the server responded with a status of 403 (Forbidden) As I am using local storage I can not configure CORS. In the previous LS versions I did set "serving_local_files" (or similar) to True. Might this be a problem?

Yannik1337 commented 3 years ago

From a previous project setup, where I ran LS and local files with the CLI, I pointed it to an audio directory, which automatically created a tasks.json file.

I can see that the files are listed there as:

"1": {
    "id": 1,
    "task_path": "/a/b/c/d/e/f/g/2020-09-21T16:37:51.flac",
    "data": {
      "audio": "/data/2020-09-21T16%3A37%3A51.flac?d=%2Fa%2Fb%2Fc%2Fd%2Fe%2Ff%2Fg"
    }
  },

whereas in the new UI they are listed as

{
  "id": 2986,
  "data": {
    "audio": "/data/local-files/?d=a/b/c/d/e/f/g/2020-09-21T16:30:48.flac"
  },
  "annotations": [],
  "predictions": []
}

The difference I want to point out is in the naming of the files.

Yannik1337 commented 3 years ago

After working with it I found an information, hidden in the console logs: Serving local files can be dangerous, so it's disabled by default. You can enable it with LOCAL_FILES_SERVING_ENABLED environment variable. I'd recommend to add this as a command line option to this page, and place a hint more prominently (e.g. when selecting local storage in the UI). This will help other users.

Edit: Enabling the serving of local files with the environment variable LOCAL_FILES_SERVING_ENABLED=True does not result in the audio successfully being played back. The following error persists:

Bildschirmfoto 2021-04-04 um 16 33 08

I have confirmed that the requested file exists by using stat path_to_file.

Yannik1337 commented 3 years ago

After upgrading to the latest Docker image docker pull heartexlabs/label-studio:1.0.1 I am now getting a new error: The console logs say: Failed to load resource: the server responded with a status of 500 (Internal Server Error), and clicking on this error I see the following screen:

Bildschirmfoto 2021-04-06 um 10 26 46

My labeling configuration is as listed here, and I have set ENV LOCAL_FILES_SERVING_ENABLED=True in my Dockerfile.

niklub commented 3 years ago

Hi, @Yannik1337 ! Sorry, I've unsuccessfully tried to reproduce your issue with the latest label studio, what it's done:

  1. Install label-studio (/version gives backend "commit": "71278b")
  2. Set local files storage, pointed to the directory where flac file is located
  3. Specify your labeling config

Then it gives me no error when checking it from the data manager view image

So can you please check:

Yannik1337 commented 3 years ago

@niklub thank you for taking the time trying to reproduce this issue. I have followed your steps, and at first it did not work. I then switched from Safari 13.0.5 (15608.5.11) to Chrome. Now it works! Thanks a lot! (You might consider adding this as a hint that this Safari version does not seem to work with audio files)