HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format
https://labelstud.io
Apache License 2.0
19.25k stars 2.39k forks source link

Show PDF from local storage or upload #1155

Closed sumanshusamarora closed 3 years ago

sumanshusamarora commented 3 years ago

I am trying to use PDF Classification template and when i upload my pdfs, they show as a text path to the pdf in-stead-of pdf view. Screenshot attached. Is this something expected or a bug? Capture

dentalala commented 3 years ago

Hi! You need to prepare a JSON-file with HyperText tag. You can find out how a similar config is set up here at Playground and compare it to yours.

sumanshusamarora commented 3 years ago

The example takes the pdf source from an image which seems to work but in my case pdf comes from local storage and thats not working.

schafsam commented 3 years ago

Hi @sumanshusamarora

This is how I was able to get my images into the project. But should work the same way with PDF’s.

I assume you run LabelStudio with the local files serving, described in the guide the whole section here .

To get the path you can check the "Treat every bucket object as a source file" and copy a file to your absolute local path. 0

Sync Storage and import your test document.

0-1

Then check with the blue </> button in the Data Manager the documents URL.

{ "id": 1, "data": { "$undefined$": "https://some-url.top/labelstudio/data/local-files/?d=data/inbox/test.jpeg" }, "annotations": [], "predictions": [] }

If you need further customization for your tasks

For an image on your local deployment it might look like this "$undefined$": "/data/local-files/?d=dataset1/1.jpg". But the /data/local-files/ route stays the same where the path to your file follows in the ?d=/abs/path/to/your/file http URL parameter.

Finally create your task according to your URL schema just discovered and set your path. Uncheck the “Treat every bucket object...” copy the files to the data directory and import your created task.json.

update/addition: Following this example on the LS Blog, there is a different way.

If there are hellper moduls for such kind of import task I am also very interested in them.

Hope this helps.

sumanshusamarora commented 3 years ago

Hi @schafsam The images work just fine mate. The problem is pdf preview. Try putting a pdf into your storage and see if that shows up even for pdf rating project.

schafsam commented 3 years ago

@sumanshusamarora

I just tried the PDF classification template from the Playground Rate PDF with creating the task from syncronizing the storage. It also does not work for me. This is what I got:

ls-pdf-problem ls-pdf-problem-data-manager

@dentalala I run version 1.1.0 from a conda environment.

sumanshusamarora commented 3 years ago

Ok, me too. Version 1.1.0 in a conda environment.

makseq commented 3 years ago

@sumanshusamarora @schafsam You have to use specially prepared tasks and load them as JSON files:

task.json

{
    "pdf": "<embed src='https://app.heartex.ai/static/samples/sample.pdf' width='100%' height='600px'/>"
}

Add Local Storage (or another one) and DO NOT press sync button. After you should prepare your task.json (or tasks.json) manually using links provided via Local Storage (like https://some-url.top/labelstudio/data/local-files/?d=data/inbox/test.jpeg) in tag:

{
    "pdf": "<embed src='https://some-url.top/labelstudio/data/local-files/?d=data/inbox/test.jpeg' width='100%' height='600px'/>"
}

And then import task.json into LS.

I know It's not very convenient, but we don't have direct PDF support yet in LS.