f205-ml-cv-lab / weekly-report-for-all-members-

1 stars 0 forks source link

2019-0609~2019-0615 #1

Open ppcd401d2 opened 5 years ago

Naiselim commented 5 years ago

Weekly report at June 13, 2019

1 Paper reading https://arxiv.org/pdf/1903.04959.pdf IJCAI2019 This work applies DRL to multi-agent problems with discrete-continuous hybrid (or parameterized) action spaces. Using Qmix architecture, Deep MAPQN (theirs) extends P-DQN in a multi-agent environment, which realizes centralized training of multi-agents and decentralized execution architecture. However, each time the final Q value is calculated, Deep MAPQN needs to calculate all continuity parameters corresponding to each optional discrete action, but only one group is actually optimal, which results in a large amount of redundant calculation. Another structure called Deep MAHHQN (theirs) uses the idea of hierarchical learning for reference. The discrete part and continuous action parameters of mixed action are output through two-layer network respectively, and the optimal mixed action group is obtained. Their experiment results on simulated RoboCup Soccer and game Ghost Story prove that their models are effective and significantly outperform existing independent deep parameterized Q-learning method.

2 Paper writing Goal: ACML 2019 second round Remain 31 days Manuscripts must be written in English, be a maximum of 16 pages (including references, appendices etc.) and follow the PMLR style. If required, supplementary material may be submitted as a separate file, but reviewers are not obliged to consider this. Existing work: the problem that ‘For’ statement in algorithm cannot wrap normally is no suitable solution at present. There is still no information on Internet and no one can explain and solve this problem. My temporary solution is to put screen capture of the original my IJCAI algorithms into my ACML template. Another problem is the position and scale of the picture are out of harmony. Except figure 1, all figures are shown after references. Maybe using the relevant parameters of the first chart may solve it. At present, the length of the paper is 15 pages. More experimental results can make up for the remain page. You can see my writing at https://www.overleaf.com/read/mbchspvgqkvg

3 Coding Successfully running experiment called CartPole at DQN. The result figure will be displayed in the next weekly report as much as possible.

More to say: Weekly report is a good opportunity to practice my English writing where I can try to describe and summarize my work and that of others. However, the convergence of all weekly reports under the same link will lead to chaos in the time series.

Leviplus commented 5 years ago

Weekly report at June 14, 2019

1 Paper reading

Paper (1): Coarse-to-fine decoding for neural semantic parsing. Direction: Sentence-level Semantics, Word-level Semantics in Nature Language Processing. Link: https://arxiv.org/pdf/1805.04793.pdf (ACL 2018)

Keynote: Semantic parsing aims at mapping natural language utterances into structured meaning representations. This paper presented a coarse-to-fine decoding framework for neural semantic parsing. In this work, a structure-aware neural architecture which decomposes the semantic parsing process into two stages is proposed. Given an input utterance, a rough sketch of its meaning is firstly generated, where low-level information (such as variable names and arguments) is glossed over. Missing details are then filled in the sketch by taking into account the natural language input and the sketch itself. The proposed framework can be easily adapted to different domains and meaning representations. Experimental results show that this method consistently improves performance on four datasets, achieving competitive results despite the use of relatively simple decoders.

Main contributions (and my comments):

  1. The decomposition disentangles high-level from low-level semantic information, which enables the decoders to model meaning at different levels of granularity. (Other methods tend to only roughly take the semantic information of sentences as a whole, and cannot achieve semantics separation at different levels.)
  2. The model can explicitly share knowledge of coarse structures for the examples that have the same sketch (i.e., basic meaning), even though their actual meaning representations are different (e.g., due to different details). (Different sentences often have the same syntactic structure, the only difference is the specific content and detailed information. The semantic analysis method proposed in this paper conforms to the structural characteristics of natural language.)
  3. After generating the sketch, the decoder knows what the basic meaning of the utterance looks like, and the model can use it as global context to improve the prediction of the final details. (The two steps of semantic parsing in this paper are not independent of each other, but closely related. The sketch obtained by rough parsing can supervise the expression details and parameters generated by fine parsing, thus improving the accuracy of parsing.)

Paper (2): Improving event coreference resolution by modeling correlations between event coreference chains and document topic structures. Direction: Document Analysis in Nature Language Processing. Link: https://www.aclweb.org/anthology/P18-1045 (ACL 2018)

Keynote: This paper models correlations between event coreference chains and document topical structures through an Integer Linear Programming formulation. The correlations between the main event chains of a document are firstly constructed through topic transition sentences, inter-coreference chain correlations, event mention distributional characteristics and sub-event structure, and then use them with scores obtained from a local coreference relation classifier for jointly resolving multiple event chains in a document. Experiments across KBP 2016 and 2017 datasets suggest that each of the structures contribute to improving event coreference resolution performance.

Main contributions (and my comments):

  1. Model correlations between event coreference chains and document topic structures by designing constraints in ILP, modifying the objective function, and encouraging associating more coreferent event mentions to a chain that has a large stretch. (Topic transition sentences often overlap in content (for reminding purposes) so that coreference links between event mentions that appear in topic transition sentences should be encouraged.)
  2. Model correlations across semantically associated event chains. (Semantically associated events often co-occur in the same sentence, thus creating coreference links between event mentions in sentences that contain other already known coreferent event mentions should be encouraged to boost performance.)
  3. Model document level distributional patterns of coreferent event mentions. (In most cases, a majority of event coreference chains tend to be initiated in the early sections of the document, and event mentions in the later paragraphs may exist as coreferent mentions of an established coreference chain or as singleton event mentions which, however, are less likely to initiate a new coreference chain. The characteristic of this distribution reminds the model that it should be encouraged to generate more event coreference links in early sections of a document.)
  4. Restraining subevents from being included in coreference chains by modifying the global optimization goal to add a new objective function. (Subevents are known to be a major source of false coreference links due to their high surface similarity with their parent events. Since that subevents referring to specific actions were seldomly referred back in a document and are often singleton events, so such specific action events should be identified and coreference links between a specific action event and other event mentions should be discouraged.)

2 Something to say I will study the direction of combining Graph Neural Network with Natural Language Processing for the next two months. Through a period of entanglement, I gradually realized that it is difficult to introduce Graph Neural Network from Natural Language Processing. However, starting with Graph Neural Network, it is a good shortcut to introduce natural language processing. The specific approach is to select a basic Graph Neural Network, preferably an automatic encoder-decoder based Graph Convolution Network or a Graph Attention Network, and improve it to be able to be used for specific NLP tasks. Then at the NLP end, a graph structure with stronger expressive ability is constructed from raw language, which can be better embedded in the Graph Neural Network. Graph based structure mainly includes two specific parts: structure information matrix (Adjacency matrix or Degree matrix) and node feature matrix. Structural information matrix is relatively fixed, and the possible improvement direction is to introduce attention mechanism or construct additional substructure graph matrix, so that the weight of the corresponding edges of the important nodes can well reflect its importance. The design of feature matrix is relatively flexible, and the node expression with rich semantic information can be obtained through a variety of networks. One possible attempt is to put it into the Graph Spatial-temporal Network and update the feature embedding of nodes along the time axis, so that the feature vectors of nodes contain not only their own semantic information, but also their connection information with adjacent nodes.

3 Plan for next week (1) Try to understand the main ideas of several main Graph Neural Networks comprehensively (Graph Convolution Networks, Graph Attention Networks, Graph Auto-encoders, Graph Generative Networks, and Graph Spatial-temporal Networks), mainly according to this article: A Comprehensive Survey on Graph Neural Networks (Link: https://arxiv.org/pdf/1901.00596.pdf). (2) Determine the type of Graph Neural Network I want to use, find a representative network, and build code environment.

ppcd401d2 commented 5 years ago

thanks ! very good !

Naiselim commented 5 years ago

Weekly report at June 18, 2019

1 Paper reading Self-Paced Prioritized Curriculum Learning with Coverage Penalty in Deep Reinforcement Learning A Novel DDPG Method with Prioritized Experience Replay These two articles are the main reference articles for my related work. The main goal of the reading this time is to analyze the experiments in these two articles. The first one verified their model through two groups of experiments: (1) Performance of DCRL with comparison to DRN and PER regarding the average Q value. (a) Space invaders. (b) Carnival. (c) Breakout. (d) Boxing. (e) Pong. (f) Kung-Fu master. (g) Skiing. (h) River raid. (i) Enduro. (j) Alien. (k) Montezuma’s revenge. (l) MS. PacMan. (2) Performance of DCRL with comparison to double DQN and dueling network regarding the average Q value. (a) Breakout. (b) Pong. (c) Skiing. (d) Alien. The abundance of experimental data makes the first article worth noting. And its image drawing is simple and requires no more processing. By contrast, the experiments in second one are not convincing. They only do experiments on MuJoCo InvertedPendulum task, which is the simplest I did in IJCAI. Its diversity is shown in the performance of accumulated rewards, replay buffer size, minibatch size and network update rate.

2 Paper writing Goal: ACML 2019 second round Remain 26 days Zheng said it is allowed to use screenshots to express the algorithm, provided that character clarity and size are within the normal range. The figures still exceed normal size and each one occupies a full page position. The response I think is to modify picture layout and merge multiple pictures into the same line. In this week I need to re-examine every paragraph of the original article. The two paper reading last week inspire me that my work can discusses the two-way advance of DDPG and UCR: DDPG enables the model to run continuous actions while UCR improves stability and capability. You can see my writing at https://www.overleaf.com/read/mbchspvgqkvg

3 Coding Zheng remains his workspace in my server host. Reading the code may be useful for my own work.

4 Next Plan Each day of this week I should work out one figure that can adapt to the template, one section changed, and one experiment new. The issue for our discuss will be posted later.

Stan-lee-1229 commented 5 years ago

Weekly report at June 28, 2019

The Oral Presetations about ICML: https://www.bilibili.com/video/av55557400 ,which conatins many academic realm such as DRL,GAN,Deep Learning Architecture,etc.I think this can help our mates in Lab. And I collected paper about DRL on ICML using BaiduNetDisk: 链接:https://pan.baidu.com/s/1DnldVCeriHPxHSt3boUGJA 提取码:bktv

Recently, I watched the course of DRL taught by Hong Yee,Li, and do some summary, mainly related to the difference between On-Policy and Off-Policy Concept.(PPO ,TRPO,PPO2);Using MC based approach and TD approach to estimate state value function;Target Network and Exploration Method(Epsilon Greedy&Boltzmann Exploration);Tip of Q-Learning including (1) Double DQN for solving the problem that Q-value is over-estimated. (2) Dueling DQN which only update the traditional architecture using adding a scalar V(s) to tune-up Q(s,a) conveniently. (3) Prioritized Replay to accomplish that the data with larger TD-error in previous training has higher probability to be sampled. (4) Multi-Step strike a balance between MC and TD (5) Noisy Net trick transform the noise on action to on parameters to achieve that given the same or similar state, the agent takes the same action. It means explore samples in a consistent way. (6) Distributional Q-function. Here , a consensus has to be reached that Q(s,a) is the expectation of accumulated reward naturally, which represent different reward distribution can have the same value. So it will lose some characteristic of reward distribution if we only take care Q(s,a).

Read a paper _ACE: An Actor Ensemble Algorithm for Continuous Control with Tree Search_ They propose an actor ensemble algorithm, named ACE, for continuous control with a deterministic policy in reinforcement learning. In ACE, they use actor ensemble (multiple actors) to search the global maxima of the critic. Besides the ensemble perspective, they also formulate ACE in the option framework by extending the option-critic architecture with deterministic intra-option policies, revealing a relationship between ensemble and options. They demonstrate a significant performance boost of ACE over DDPG and its variants in MuJoCo.