Closed Davidchenyuhao closed 2 years ago
Hey,
Regarding the col_mask
and table_mask
folders, for each image we have two images as ground truth for training. One image has Column Masks and another has table masks. So you put those images in the respective folders.
You can find one example here of masks : https://ibb.co/fdCMqwV
" xml have no Composites" : The Marmots dataset by itself is not 100% complete. There are very few images for which we dont have XML cordinates, and hence we cant generate Masks. In that case we just use an empty mask.
There maybe one more case, where you have the Table masks in Marmots dataset, but dont have Column masks in Extended Marmots dataset. In such cases too, we just keep the columns masks and take table masks as blank. After our model is trained, we observe that the model can handle such cases with ease and learns to identify tables even if in some cases the masks are missing.
'''
Input: file_name -> image_id without extension, dimensions of image
Output: column_mask, list_of_bounding_boxes
Objective: Read xml file and generate column masks.
If xml file not found, return blank mask
'''
if file_name+".xml" not in col_data_paths:
return np.array(Image.new("L", (width,height))),[None]
hello, could you provide the dataset, i try to train the model, but there is still something wrong with the dataset. i don't know how to solve it. For example, there is a xml have no Composites, so it will go wrong, and what should i do to solve the problem, delete it or do something else. And i have no idea about what should i put in the folder called "col_mask" and "table_mask". thank you for your help. this is my email thanks. davidchen99@126.com