AlibabaResearch / AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Apache License 2.0
1.22k stars 152 forks source link

some questions about embedding in the code and in the paper #64

Open kenneys-bot opened 9 months ago

kenneys-bot commented 9 months ago

issue

    if inputs_embeds is None:
        inputs_embeds = self.word_embeddings(input_ids)
    token_type_embeddings = self.token_type_embeddings(token_type_ids)

    embeddings = inputs_embeds + token_type_embeddings
    if self.position_embedding_type == "absolute":
        position_embeddings = self.position_embeddings(position_ids)
        embeddings += position_embeddings

    if "line_bbox" in kwargs:
        embeddings += self._cal_spatial_position_embeddings(kwargs["line_bbox"])

    if "line_rank_id" in kwargs:
        embeddings += self.line_rank_embeddings(kwargs["line_rank_id"])

    if "line_rank_inner_id" in kwargs:
        embeddings += self.line_rank_inner_embeddings(kwargs["line_rank_inner_id"])`

For the Token Embeddings, 1D Seg.Rank Embeddings and 1D Seg. BIE Embeddings in the figure, I couldnot understand their meanings, and there is no clear explanation in the paper, finally I found the corresponding position in the code for debugging. As a result, a new problem was encountered. What exactly are inputs_embeds and token_type_embeddings in the code? Is the result of adding them both together the Token Embeddings in the diagram? 1D Seg. Rank Embeddings are line_rank_embeddings? 1D Seg. BIE Embeddings are line_rank_inner_embeddings? Very much looking forward to getting a quickly reply from the developer soon!

ccx1997 commented 8 months ago

Hi, The token type embeddings are the same for all tokens. It is remained as used in BERT/BROS. The 1D Seg. Rank Embeddings and 1D Seg. BIE Embeddings are exactly what you mensioned.