Open cqray1990 opened 3 years ago
In the competition, we tried to use all rules (we can figure out) to improve the performance, some useful rules were missed due to that we only spent less than two months on this competition.
As we known, the content in thead will have “b/b" whether the image text is bold or not. So we need to filter out the text-line images of the thead. If there is "b/b" in the remaining tbody content, the text of the picture is bold. The pattern you mentioned above may not cause ambiguity, so we do not filter. Of course,you can modify the post-processing rules to get a higher score.
cel l内容标签里有“”等怎么没有过滤掉呢,看代码之过滤掉了,但是其他不需要处理?
def remove_Bb(self, content):
"""
This function will remove the '' and '' of the content.
:param content: [list]. text content of each cell.
:return: text content without '' and ''.
"""
if '' in content:
content.remove('')
if '' in content:
content.remove('')
return content