ami-iit / element_human-action-intention-recognition

8 stars 0 forks source link

Prepare the Camera-ready paper for NeurIPS Workshop #68

Closed kouroshD closed 2 years ago

kouroshD commented 2 years ago

Camera-ready due Nov 19 The camera-ready version of the paper is due Nov. 19th AOE. There is an option on CMT to upload it. If you have any supplementary materials, you can upload them elsewhere (e.g. Google Drive, Youtube, arXiv, etc.) and put the link on the main paper PDF. Alternatively, you can also attach supplementary pages after the references of the main paper PDF. Please note that we only accept PDF for the camera-ready version. We will make the PDF you upload to CMT available to the public through our website http://www.robot-learning.ml/2021/. However, it will not be considered as proceedings of the conference.

CC @DanielePucci

kouroshD commented 2 years ago

Paper Review and Possible Actions:

Reviewer 1

1. [Summary] Please summarize the main claims/contributions of the paper in your own words. (Do not provide any review in this box) The paper "Simultaneous Human Action and Motion Prediction" investigates how to build a predictive model for human motion. Based on relevant literature, the authors discuss how human motion prediction can be naturally done by first identifying the action a human is performing (e.g., walking, turning), and then conditioning on this latent variable in the motion predictive model. The authors propose learning a Mixture of Experts (MoE) architecture (i.e., a gating model that predicts the action, and an expert motion predictive model conditioned on a given action) from expert-labelled human demonstration data.

2. [Novelty] How novel is the paper? Paper contributes some new ideas

3. [Soundness] Is the paper technically sound? I have not checked all details, but the paper appears to be technically sound

4. [Impact] If this preliminary research work is extended in the future, how impactful is the paper likely to be, considering methodological contributions and/or applications? It will impact a moderate number of researchers

5. [Clarity] Is the paper well-organized and clearly written? Fair: paper is somewhat clear, but important details are missing or confusing, which hurts readability

6. [Relevance] How relevant the content of this paper to the workshop? Note that the theme of the workshop is "Self-Supervised and Lifelong Learning." Please see the Call for Papers for the scope: http://www.robot- learning.ml/2021/ Some aspects are relavent

7. [Reasons to Accept] Please list down the key strengths of the paper. I find the kernel idea of using a MoE architecture interesting for human motion prediction. The experimental results are clearly demonstrated with informative graphics and a video.

8. [Reasons to Reject] Please list down the key weaknesses of the paper. The description of rigour in the paper is misleading. A lot of the paper is spent on background material, rather than on explaining design decisions. No comparisons are made to state of the art approaches in the experiments.

9. [Detailed Comments] Please provide other detailed comments. Since this is a workshop paper, your constructive feedback will immensely help the authors to improve the paper before submitting to a future conference. Some good practices when reviewing a workshop paper includes, - Praising what is good about the paper before stating your criticism. - Providing citations to back your statements. Reviewers are encourage to suggest arXiv/other workshop papers to the authors but please do not punish the authors for not citing them. - Providing a list of "actionable items" for authors to improve their paper before submitting to a future conference.

The authors purport to "rigorously describe the human motor control policy for motion generation". However, the discussion on page 2 just overviews basic background knowledge on the manipulator equations and uses this to justify the generic form of the predictive model in (4).

So-called "Corollary 1" and "Axiom 1" should be designated as "Remarks" or smoothly integrated into the surrounding text.

The space given to this section is ultimately wasted since the authors do not substantially leverage the specifics of (3) and (4) to improve learning. For example, the authors describe that (3) is an optimal control problem, but don't use this to justify, e.g., setting up model learning as an inverse RL problem for the cost function.

In general, the authors do not adequately explain their design decisions for the learning architecture.

The choice of a MoE architecture makes sense. However: - How was the loss function (6) chosen (e.g., based on prior work, baseline comparisons)?

- What functions are the authors parameterizing (e.g., D1 and D2 in (5)?)?

- The authors state that a common approach in the literature is to treat human action and motion prediction as sequential steps. However, (5) looks as if the authors do exactly this in first learning a model D1 to predict the action, then feed this into a model D2 to predict motion. The authors should explain how their MoE architecture differentiates their contribution from prior work.

Finally, the authors should provide comparisons against the state of the art (e.g., against methods they mention that treat human action and motion prediction with a "waterfall" approach).

Reviewer 3

1. [Summary] Please summarize the main claims/contributions of the paper in your own words. (Do not provide any review in this box) This paper addresses the problem of simultaneous human action and motion prediction. The author(s) propose a Mixture of Experts (MoE) deep neural network (DNN) approach to solve the problem and test the proposed solution with experiments.

2. [Novelty] How novel is the paper? Paper contributes some new ideas

3. [Soundness] Is the paper technically sound? I have carefully checked all details and did not find any technical flaw

4. [Impact] If this preliminary research work is extended in future, how impactful is the paper likely to be, considering methodological contributions and/or applications? It will impact a moderate number of researchers

5. [Clarity] Is the paper well-organized and clearly written? Good: paper is well organized but language can be improved

6. [Relevance] How relevant the content of this paper to the workshop? Note that the theme of the workshop is "Self-Supervised and Lifelong Learning." Please see the Call for Papers for the scope: http://www.robot- learning.ml/2021/ Some aspects are relavent

7. [Reasons to Accept] Please list down the key strengths of the paper. Interesting problem

8. [Reasons to Reject] Please list down the key weaknesses of the paper. Motivation

9. [Detailed Comments] Please provide other detailed comments. Since this is a workshop paper, your constructive feedback will immensely help the authors to improve the paper before submitting to a future conference. Some good practices when reviewing a workshop paper includes, - Praising what is good about the paper before stating your criticism. - Providing citations to back your statements. Reviewers are encourage to suggest arXiv/other workshop papers to the authors but please do not punish the authors for not citing them. - Providing a list of "actionable items" for authors to improve their paper before submitting to a future conference.

This paper addresses the problem of simultaneous human action and motion prediction, which is an interesting problem. However, the motivation of this paper is not clear in the introduction.

The fonts in the plots in figure 1 are too small and the author(s) should reconsider the formatting or the figure layout.

The bibliography lacks consistency in style.

Finally, there are some typos and missing commas throughout the paper.

kouroshD commented 2 years ago

Grouped Comments:

General

Introduction, State-of-the-art

Backgrounds, Problem Definition

Proposed Solution

Experiments, Results, Discussion

Bibliography

kouroshD commented 2 years ago

Most of the points have been addressed and the paper has been submitted. The link to the final paper and the files can be found in: https://istitutoitalianotecnologia.sharepoint.com/:f:/r/sites/DynamicInteractionControl/Documenti%20condivisi/SoftManBot/element_human-action-intention-recognition/issues/68-NeuripsFinalPaper?csf=1&web=1&e=YVdhw9 Here is the link to the video: https://youtu.be/XK4vmD6pJ9Q

@DanielePucci also reviewed the final version of the paper, and provided some comments, and they have been added. We decided to remove the appendices for the current workshop paper, and later we will add them to the journal paper.