CVC-DAG / STEP

Code for the paper "STEP - Towards Structured Scene-Text Spotting"
Apache License 2.0
5 stars 1 forks source link

The dataset is missing. #1

Closed Wudicxy closed 6 months ago

Sergigb commented 6 months ago

Hello, which dataset are you referring to? The download links to the HierText-based training/validation and our proposed test set can be found in this section of the readme:

https://github.com/CVC-DAG/STEP?tab=readme-ov-file#datasets

The following link contains the json with the GT for the training and validation:

http://datasets.cvc.uab.cat/STEP/structured_ht.zip

While this one contains the test data:

http://datasets.cvc.uab.cat/STEP/structured_test.zip
Wudicxy commented 6 months ago

aws s3 --no-sign-request cp s3://open-images-dataset/ocr/train.tgz datasets/hiertext aws s3 --no-sign-request cp s3://open-images-dataset/ocr/validation.tgz datasets/hiertext missing train and validation datasets .eagerly await your response

Sergigb commented 6 months ago

What is the issue exactly? These two commands require the AWS CLI and will download the HierText train and validation data (although we only use the images). This is the download method provided by the authors in the original repository.

Wudicxy commented 6 months ago

when i use aws s3 --no-sign-request cp s3://open-images-dataset/ocr/train.tgz datasets/hiertext it will tell me ERR_TUNNEL_CONNECTION_FAILED (browser error:The website with the URL https://open-images-dataset.s3.cl.amazonaws.com/ocr/train.tgz may be temporarily unavailable, or it may have been permanently moved to a new URL.) Could you please tell me if this website is available?

Sergigb commented 6 months ago

This command should be used in a command line with the AWS CLI installed, you seem to be trying to access it via a browser. You can download these files via the browser with these links however:

https://open-images-dataset.s3.amazonaws.com/ocr/train.tgz https://open-images-dataset.s3.amazonaws.com/ocr/validation.tgz

you have to place the uncompressed files under datasets/hiertext.

Wudicxy commented 6 months ago

great thank for your answer

Wudicxy commented 6 months ago

hello,can i ask a question .(KeyError: "Dataset 'hiertext_validation' is not registered!)could you tell my the solution?thank for your answer!

Sergigb commented 6 months ago

Hello again, yes I believe this was my mistake! The name of the validation split was wrongly registered in the file adet/data/builtin.py file; line 48 should be: "hiertext_validation": ("hiertext/validation", "hiertext/validation.jsonl") instead of the current: "hiertext_val": ("hiertext/val", "hiertext/validation.jsonl") I changed this line in the last commit, if you pull the latest changes of the repo the problem should be fixed. Please, tell me if that fixed your mistake.

Wudicxy commented 6 months ago

yes. the line 48 is "hiertext_val": ("hiertext/validation", "hiertext/validation.jsonl").The solution is right.Thanks for your answer very much.