Open mosicr opened 2 years ago
Hi, thanks for posting the question! I'd like to learn more, can you elaborate and maybe give a relatively concrete example in the context of RL?
Thanks Yujin, Perhaps best analogy would be what detectives or intelligence services do: collect information from many unrelated sources to try to make decision / create a policy ( arrest a person ; start a preemptive strike ). In terms of RL: head agent is trying to develop a policy on what action to take given a state on terrain. Many smaller agents are unleashed and develop mini-policies, which are then merged into one global policy. In NLP world, it would be: stock ticker information, newspapers articles, earnings calls are analyzed by mini-agents, developing mini-policies on what action to take in a given state ( TSLA goes down; mini-action is buy TSLA; headlines talk about Tesla will experience difficulties in EU market; mini-action is sell TSLA ). Global policy takes all these policies into account and develops overarching policy ( buy / sell / hold TSLA). Or, to create article summarization in NLP : have a large language models ( BERT etc ) develop mini-policies on what article pieces are relevant to be picked / included in a summary ( start/stop positions; so action would be include this text span in a final summary - yes / no - based on state - text context; this would include external text sources, dictionaries etc .) The main policy would then integrate all these mini-policies to produce article summary that doesn't only include text spans from the original article but also include external text sources.
Hi, thanks for the examples!
If I understand it correctly, all these can be summarized in a hierarchical decision making system. One naive example of such a system is an ensemble of models where each individual model is responsible for interpreting a subset of the data modalities/sources, and an information integrator at a higher level makes the final decision. Although AttentionNeuron, or Attention in a general sense, should be able to achieve the same, an ensemble of models (in my opinion) is more straightforward and efficient.
Thank you for your answer, makes sense.
Hi, Great work !! I wonder if similar approach would work on text data coming from disparate sources ( that never belonged to the same text ). It'd be reverse of what you did in the paper - putting together a puzzle / learning a policy from many disparate pieces .