Closed zjuPeco closed 5 years ago
We released the feature extraction codes at graph.py.
The "relative" means the distance relative to the size of the table/chunk. I also think the "relative" features are more important, but the model can also benefit from "absolute" features.
We also perform a feature normalization over training examples by:
def _norm(features, mean, std, eps=1e-6):
return (features - mean) / (std + 1e-6)
We released the feature extraction codes at graph.py.
The "relative" means the distance relative to the size of the table/chunk. I also think the "relative" features are more important, but the model can also benefit from "absolute" features.
Thank you for your prompt reply!
In your paper, you briefly described your input features, but that's a bit confusing to me.
Suppose we have two cells:
the coordinate of the table_image is
[x_table_min, x_table_max, y_table_min, y_table_max]
The corresponding height and width of the pdf_image areh_pdf
andw_pdf
respectively.So the vertex and edge features should be?
The following is my guess:
vertex features: 1) the size of cells:
2) absolute locations:
3) relative locations:
edge features: 1) Euclidean distance:
2) x-axis distance (absolute and relative)
3) y-axis distance (absolute and relative)
4) overlap of the cell pairs along x-axis and y-axis all absolute values? no relative values this time? for example, the x_o bellow:
Can you tell me if my "absolute" and "relative" are the same with you? Which one do you use?
And one more question, are the absolute features really needed since we have the relative features?