awesome-davian / awesome-reviews-kaist

Computer vision paper reviews written by KAIST AI students
Creative Commons Attribution Share Alike 4.0 International
43 stars 117 forks source link

[2021 fall] ICLR 2020 Pre-training of Generic Visual-Linguistic Representations (VL-BERT) (20213784) #124

Closed atjeong closed 2 years ago

bo-10000 commented 3 years ago

서식 관련

내용 관련

tungngreen commented 3 years ago
  1. Summary

    The article reviews the paper titled Pre-training of Generic Visual-Linguistic Representations by Su et al. The paper focuses on a pretraining method to align visual representations and linguistic information to serve various downstream tasks.

  2. Detailed Reviews

    Strengths:

    • In general, the article was very well written. All parts are covered well with enough technical details and sufficient explanations.
    • The author of this article (from this point on, simply referred to as the author, and not to be confused with the authors of the original paper) has done a terrific job in the first two sections.
      • The Problem Definition section gently, but efficiently introduces the problem that the paper is trying to solve. Core concepts are mentioned and well-explained.
      • The Related Works section provides enough information related to the proposed method.
      • The idea is well summarized.
    • It's very good that the author spends time sufficiently explaining BERT before going into the architecture of VL-BERT.
    • The architecture, pretraining and finetuning steps are well explained. So is the rest of the paper.

    Reviewer's Suggestions:

    • In general, I think the article is sufficient to serve the purpose of reviewing. There are only a few details I hope the author can fix:
      • For Visual Appearance Feature and Visual Geometry Embedding, it would be great if the author can include some visualization for these concepts.
      • The formulas are not rendered by gitbook. I suggest the author use a typeset website to type the formulas and reference the links or include their rendered figures.
      • The explanations for the datasets are a bit short.
      • Some references to topics mentioned in Sections 1 and 2 should be included so the readers can read up on them. It doesn't have to be a lot, just a few papers or review articles that the authors think would help the readers.
      • Some long sentences are somewhat confusing. E.g., "A typical example is a model such as BERT using the "Masked Language Model" method in the Vision area using a pre-trained model on ImageNet data as a backbone, or in the NLP field that has appeared a little more recently." I think the author could divide it into shorter sentences.
juyebshin commented 3 years ago

안녕하세요, 신주엽입니다. 먼저 리뷰 잘 읽었습니다. Vision과 linguistic model을 결합한 task에 대해 전반적으로 이해할 수 있었습니다.

리뷰를 읽으면서 추가하거나 수정하면 좋은 것들 정리해보았습니다.

감사합니다.