askforalfred / alfred

ALFRED - A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
MIT License
362 stars 78 forks source link

Accessing frames in batches and selecting section of Natural language instruction for the input to the model.forward() function. #13

Closed Alrick11 closed 4 years ago

Alrick11 commented 4 years ago

I had one question and a clarification regarding the code.

Question 1. We get frames (images), only when we interact with the environment, because the action we choose decides the frame we receive, but you seem to be loading the frames from the dataset. How can I know the frames before hand without executing my model output action in the environment?

Clarification: You mask your padded NL instructions using DotAttn() function (where in you multiply the language instruction with the previous hidden states), and then submit it as an input to your model.forward(). Is this a correct interpretation of how you decide which part of the language instruction is to be used for the next iteration?

Thanks

MohitShridhar commented 4 years ago
  1. So the Seq2Seq models are trained offline with dataset images, but evaluated online with the simulator. The training objective is to imitate the actions taking by the expert. If you need real-time feedback from actions (e.g. typical RL setting), then you need to run the simulator.

  2. This part handles language attention. DotAttn computes a weighted sum of language encodings. This weighted sum is then concatenated with the vision and previous action embeddings.

Alrick11 commented 4 years ago

Thank you for the clarification!

Best

On Mon, Feb 24, 2020, 10:52 PM Mohit Shridhar notifications@github.com wrote:

1.

So the Seq2Seq models are trained offline with dataset images, but evaluated online with the simulator. The training objective is to imitate the actions taking by the expert. If you need real-time feedback from actions (e.g. typical RL setting), then you need to run the simulator https://github.com/askforalfred/alfred/blob/master/models/eval/eval_task.py#L63 . 2.

This part https://github.com/askforalfred/alfred/blob/1c9e83233f36c1629c07659528f022dfb01b1456/models/nn/vnn.py#L140 handles language attention. DotAttn computes a weighted sum of language encodings. This weighted sum is then concatenated https://github.com/askforalfred/alfred/blob/1c9e83233f36c1629c07659528f022dfb01b1456/models/nn/vnn.py#L143 with the vision and previous action embeddings.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/askforalfred/alfred/issues/13?email_source=notifications&email_token=AM3RGBH6OHJNEPMXCH2BP2DRES54LA5CNFSM4K2CMFSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM2Y64I#issuecomment-590712689, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM3RGBE323UKFCTRYEH4O23RES54LANCNFSM4K2CMFSA .

Alrick11 commented 4 years ago

Hi,

I know the topic is closed but how did you generate the offline dataset images?

Best

Alrick Dsouza Grad student at UW Contact: +1-206-902-7962 Facebook https://www.facebook.com/alrick.dsouza | Linkedin http://linkedin.com/in/alrick-dsouza-170743113

On Tue, Feb 25, 2020 at 10:06 AM Mohit Shridhar notifications@github.com wrote:

Closed #13 https://github.com/askforalfred/alfred/issues/13.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/askforalfred/alfred/issues/13?email_source=notifications&email_token=AM3RGBGLOA2SONDSPW7IIQTREVMZZA5CNFSM4K2CMFSKYY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOW4BOVVQ#event-3070421718, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM3RGBEALQTXRZVNS3X6253REVMZZANCNFSM4K2CMFSA .

MohitShridhar commented 4 years ago

We used a PDDL-based planner to generate expert demonstrations, and then executed them in the THOR simulator. Images were saved after taking each action in the demonstration.

Please refer to the paper for more details.

Alrick11 commented 4 years ago

Thank you!!

On Tue, Feb 25, 2020, 10:58 PM Mohit Shridhar notifications@github.com wrote:

We used a PDDL-based planner to generate expert demonstrations, and then executed them in the THOR simulator. Images were saved after taking each action in the demonstration.

Please refer to the paper https://arxiv.org/pdf/1912.01734.pdf for more details.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/askforalfred/alfred/issues/13?email_source=notifications&email_token=AM3RGBC3X3R2NXFVL2T5NYLREYHIJA5CNFSM4K2CMFSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM7BIII#issuecomment-591270945, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM3RGBCIERIP5NPIRT4RIVDREYHIJANCNFSM4K2CMFSA .