Layout-Parser / layout-model-training

The scripts for training Detectron2-based Layout Models on popular layout analysis datasets
198 stars 54 forks source link

Potential issue with cocosplit #4

Closed bholmessyapse closed 2 years ago

bholmessyapse commented 2 years ago

Hi - just wanted to, first, say thank you for this excellent utility and walkthrough!

I'm not entirely sure if this is a bug, or if I'm simply doing the process wrong - either way, I wanted to document what happened and see where the problem is:

I have a folder with images, most of which I have annotated with labelme. I then used labelme2coco to create a consolidated coco file from all these labelme jsons.

When I run

python3 utils/cocosplit.py --annotation_path [pathToFile]/trainval.json --train ./train --test ./test --split-ratio 0.8 --having-annotations

I get

ValueError: With n_samples=0, test_size=None and train_size=0.8, the resulting train set will be empty. Adjust any of the aforementioned parameters.

I suspect that n_samples here is 0 because all files included in the trainval.json file had annotations in them - I guess it didn't include the images without annotations (though I suspect you'd run into this problem as well if you annotated every file in the folder)

The particular call that fails in cocosplit is

tr_wo_ann, ts_wo_ann = train_test_split(img_wo_ann, train_size=split_ratio, random_state=random_state) where img_wo_ann is 0, since all files had annotations.

If I change the code to

    if img_wo_ann:
        tr_wo_ann, ts_wo_ann = train_test_split(img_wo_ann, train_size=split_ratio,
                                                random_state=random_state)

the script runs correctly, and I get Saved 133 entries in [pathtofile]/Train and 34 in [pathtofile]/Test

Is it incorrect to make this addition? Am I missing something? Thanks!

bholmessyapse commented 2 years ago

Problem was based on a library modified by somebody else - probably not your problem!

yht4work commented 2 years ago

I have encountered the same problem at PubLayNet val dataset , what's your answer about it?@bholmessyapse

Is it incorrect to make this addition? Am I missing something? Thanks!