HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format
https://labelstud.io
Apache License 2.0
19.36k stars 2.4k forks source link

Can data upload support pdf and word? Currently, only txt is supported. #5589

Closed yysturdy closed 2 months ago

yysturdy commented 8 months ago

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

deppp commented 8 months ago

You'd import those as URLs and embed viewing those documents using HyperText tag. Take a look at "Rate PDF" example in Playground: https://labelstud.io/playground/

pbanuru commented 6 months ago

@deppp How should we import the URLs with the embed?

For example, I'm seeking:

{
    "pdf": "<embed src='https://app.heartex.ai/static/samples/sample.pdf' width='100%' height='600px'/>"
}

I tried saving a text file containing just: <embed src='https://app.heartex.ai/static/samples/sample.pdf' width='100%' height='600px'/> as "sample_pdf_url.txt" and importing it.

However, it shows up like this:

{
    "pdf": "/data/upload/1/4de68584-sample_pdf_url.txt"
}

Is there a way to ensure the embed tag itself is read and displayed instead of just showing the file path? Thanks!

pbanuru commented 6 months ago

I found the solution!:

solution is to put the entire:

{
    "pdf": "<embed src='https://app.heartex.ai/static/samples/sample.pdf' width='100%' height='600px'/>"
}

in a .json file, and import it.

makseq commented 2 months ago

Closed as a non-active issue, and it seems like it has been resolved.