boostcampaitech3 / level2-data-annotation_cv-level2-cv-16

[부스트캠프 AI Tech 3기 / CV-16] 글자 검출 대회 Data Annotation (22.04.11 - 22.04.22)
1 stars 3 forks source link

[Data] CV check #14

Closed sodabeans closed 2 years ago

sodabeans commented 2 years ago

What

Why

How

SSANGYOON commented 2 years ago

for i,(train_idx, test_idx) in enumerate(kfold.split(datas['images'])): train = dict() keys = list(datas['images'].keys()) train['images'] = dict() valid['images'] = dict() for ti in train_idx: key = keys[ti] train['images'][key] = datas['images'][key] for ti in test_idx: key = keys[ti] valid['images'][key] = datas['images'][key] with open('train_'+str(i)+'.json', 'w', encoding='utf-8') as make_file: json.dump(train, make_file, indent="\t") with open('valid_'+str(i)+'.json', 'w', encoding='utf-8') as make_file: json.dump(valid, make_file, indent="\t")

SSANGYOON commented 2 years ago

일단 k-fold로 cv split한거 5개 만들어 놓았습니다. train 80 valid 20의 비율이 적절할까요? 총 데이터의 갯수는 6100개 정도입니다.