HumanSignal / label-studio-converter

Tools for converting Label Studio annotations into common dataset formats
https://labelstud.io/
255 stars 132 forks source link

url-encode file names in order to create valid file URLs? #150

Open nicky1038 opened 1 year ago

nicky1038 commented 1 year ago

First of all, I would like to thank everyone involved in creation of this utility and such a cool instrument as label-studio at all :)

Everything works great. Besides this minor issue.

I tried to label-studio-converter import yolo a dataset with an image which name contained a + sign. It seems like to create file URLs the utility just leaves file names as-is and just prepends image-root-url. So for above-named file it produced a URL which also contained a + sign, and it appeared to be invalid. Manually changing + to %2B in this URL works.

Hence I have a suggestion - to url-encode all file-names forcibly or by specifying some cli flag, if there are use-cases when it is not needed.

Thanks!

makseq commented 1 year ago

@nicky1038 Great thanks! Could you contribute it and make a PR? It's here: https://github.com/heartexlabs/label-studio-converter/blob/master/label_studio_converter/imports/yolo.py#L60

ligaz commented 1 year ago

@makseq This change introduced a regression if the root URL is in the form of s3://bucket_name - the : gets encoded as well.

makseq commented 1 year ago

@ligaz thank you for pointing me to this issue. Can you create a PR with a fix? seems like we need something like

"image": pathname2url(os.path.join(image_root_url, image_file_base))  #eg. '../../foo+you.py' -> '../../foo%2Byou.py'

=>

if '://' in image_root_url:
  prefix, root_url = image_root_url.split('://')
else:
  prefix = ''
  root_url = image_root_url
"image": prefix + pathname2url(os.path.join(root_url, image_file_base))  #eg. '../../foo+you.py' -> '../../foo%2Byou.py'