Closed doralune closed 1 year ago
README says that only the test set is provided, so it might not exist in the first place. I have created the train set (2,885 items) with the python functions below and put the download link here. I could also reproduce the test set (716 items) with the same functions, so it might be correct.
from pathlib import Path
import json
def is_complicated_structure(structure_file: Path):
with open(structure_file, "r") as f:
a_dict = json.load(f)
for cell in a_dict["cells"]:
if is_merged_cell(cell) and is_non_empty_cell(cell):
return True
return False
def is_merged_cell(cell):
if cell["start_row"] != cell["end_row"]:
return True
if cell["start_col"] != cell["end_col"]:
return True
return False
def is_non_empty_cell(cell):
#return len(cell["tex"]) > 0
return len(cell["content"]) > 0
I cannot find the training set of complicated tables (2,885 items) in the current download link.