New submissions for Wed, 12 May 21

Keyword: super resolution

There is no result

Keyword: gan

Towards Discovery and Attribution of Open-world GAN Generated Images

Authors: Sharath Girish, Saksham Suri, Saketh Rambhatla, Abhinav Shrivastava
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2105.04580
Pdf link: https://arxiv.org/pdf/2105.04580
Abstract With the recent progress in Generative Adversarial Networks (GANs), it is imperative for media and visual forensics to develop detectors which can identify and attribute images to the model generating them. Existing works have shown to attribute images to their corresponding GAN sources with high accuracy. However, these works are limited to a closed set scenario, failing to generalize to GANs unseen during train time and are therefore, not scalable with a steady influx of new GANs. We present an iterative algorithm for discovering images generated from previously unseen GANs by exploiting the fact that all GANs leave distinct fingerprints on their generated images. Our algorithm consists of multiple components including network training, out-of-distribution detection, clustering, merge and refine steps. Through extensive experiments, we show that our algorithm discovers unseen GANs with high accuracy and also generalizes to GANs trained on unseen real datasets. We additionally apply our algorithm to attribution and discovery of GANs in an online fashion as well as to the more standard task of real/fake detection. Our experiments demonstrate the effectiveness of our approach to discover new GANs and can be used in an open-world setup.
SUrgical PRediction GAN for Events Anticipation
Authors: Yutong Ban, Guy Rosman, Thomas Ward, Daniel Hashimoto, Taisei Kondo, Hidekazu Iwaki, Ozanan Meireles, Daniela Rus
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2105.04642
Pdf link: https://arxiv.org/pdf/2105.04642
Abstract Comprehension of surgical workflow is the foundation upon which computers build the understanding of surgery. In this work, we moved beyond just the identification of surgical phases to predict future surgical phases and the transitions between them. We used a novel GAN formulation that sampled the future surgical phases trajectory conditioned, on past laparoscopic video frames, and compared it to state-of-the-art approaches for surgical video analysis and alternative prediction methods. We demonstrated its effectiveness in inferring and predicting the progress of laparoscopic cholecystectomy videos. We quantified the horizon-accuracy trade-off and explored average performance as well as the performance on the more difficult, and clinically important, transitions between phases. Lastly, we surveyed surgeons to evaluate the plausibility of these predicted trajectories.
GroupLink: An End-to-end Multitask Method for Word Grouping and Relation Extraction in Form Understanding
Authors: Zilong Wang, Mingjie Zhan, Houxing Ren, Zhaohui Hou, Yuwei Wu, Xingyan Zhang, Ding Liang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2105.04650
Pdf link: https://arxiv.org/pdf/2105.04650
Abstract Forms are a common type of document in real life and carry rich information through textual contents and the organizational structure. To realize automatic processing of forms, word grouping and relation extraction are two fundamental and crucial steps after preliminary processing of optical character reader (OCR). Word grouping is to aggregate words that belong to the same semantic entity, and relation extraction is to predict the links between semantic entities. Existing works treat them as two individual tasks, but these two tasks are correlated and can reinforce each other. The grouping process will refine the integrated representation of the corresponding entity, and the linking process will give feedback to the grouping performance. For this purpose, we acquire multimodal features from both textual data and layout information and build an end-to-end model through multitask training to combine word grouping and relation extraction to enhance performance on each task. We validate our proposed method on a real-world, fully-annotated, noisy-scanned benchmark, FUNSD, and extensive experiments demonstrate the effectiveness of our method.
A Value-driven Approach for Software Process Improvement -- A Solution Proposal
Authors: Ramtin Jabbari, Nauman bin Ali, Kai Petersen
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2105.04767
Pdf link: https://arxiv.org/pdf/2105.04767
Abstract Software process improvement (SPI) is a means to an end, not an end in itself (e.g., a goal is to achieve shorter time to market and not just compliance to a process standard). Therefore, SPI initiatives ought to be streamlined to meet the desired values for an organization. Through a literature review, seven secondary studies aggregating maturity models and assessment frameworks were identified. Furthermore, we identified six proposals for building a new maturity model. We analyzed the existing maturity models for (a) their purpose, structure, guidelines, and (b) the degree to which they explicitly consider values and benefits. Based on this analysis and utilizing the guidelines from the proposals to build maturity models, we have introduced an approach for developing a value-driven approach for SPI. The proposal leveraged the benefits-dependency networks. We argue that our approach enables the following key benefits: (a) as a value-driven approach, it streamlines value-delivery and helps to avoid unnecessary process interventions, (b) as a knowledge-repository, it helps to codify lessons learned i.e. whether adopted practices lead to value realization, and (c) as an internal process maturity assessment tool, it tracks the progress of process realization, which is necessary to monitor progress towards the intended values.
Scalable Personalised Item Ranking through Parametric Density Estimation
Authors: Riku Togashi, Masahiro Kato, Mayu Otani, Tetsuya Sakai, Shin'ichi Satoh
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2105.04769
Pdf link: https://arxiv.org/pdf/2105.04769
Abstract Learning from implicit feedback is challenging because of the difficult nature of the one-class problem: we can observe only positive examples. Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem. However, such methods have two main drawbacks particularly in large-scale applications; (1) the pairwise approach is severely inefficient due to the quadratic computational cost; and (2) even recent model-based samplers (e.g. IRGAN) cannot achieve practical efficiency due to the training of an extra model. In this paper, we propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart while performing similarly to the pairwise counterpart in terms of ranking effectiveness. Our approach estimates the probability densities of positive items for each user within a rich class of distributions, viz. \emph{exponential family}. In our formulation, we derive a loss function and the appropriate negative sampling distribution based on maximum likelihood estimation. We also develop a practical technique for risk approximation and a regularisation scheme. We then discuss that our single-model approach is equivalent to an IRGAN variant under a certain condition. Through experiments on real-world datasets, our approach outperforms the pointwise and pairwise counterparts in terms of effectiveness and efficiency.
An Innovative Security Strategy using Reactive Web Application Honeypot
Authors: Rajat Gupta, Madhu Viswanatham V., Manikandan K
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2105.04773
Pdf link: https://arxiv.org/pdf/2105.04773
Abstract Nowadays, web applications have become most prevalent in the industry, and the critical data of most organizations stored using web apps. Hence, web applications a much bigger target for diverse cyber-attacks, which varies from database injections-SQL injection, PHP object injection, template injection, XML external entity injection, unsanitized input attacks- Cross-Site Scripting(XSS), and many more. As mitigation for them, among many proposed solutions, web application honeypots are a much sophisticated and powerful protection mechanism. In this paper, we propose a low interaction, adaptive, and dynamic web application honeypot that imitates the vulnerabilities through HTTP events. The honeypot is built with SNARE and TANNER; SNARE creates the attack surface and sends the requests to TANNER, which evaluates them and decides how SNARE should respond to the requests. TANNER is an analysis and classification tool, which analyzes and evaluates HTTP requests served by SNARE and to compose the response, it is powered by emulators, which are engines used for the emulation of vulnerabilities.
Characterizing GAN Convergence Through Proximal Duality Gap
Authors: Sahil Sidheekh, Aroof Aimen, Narayanan C. Krishnan
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2105.04801
Pdf link: https://arxiv.org/pdf/2105.04801
Abstract Despite the accomplishments of Generative Adversarial Networks (GANs) in modeling data distributions, training them remains a challenging task. A contributing factor to this difficulty is the non-intuitive nature of the GAN loss curves, which necessitates a subjective evaluation of the generated output to infer training progress. Recently, motivated by game theory, duality gap has been proposed as a domain agnostic measure to monitor GAN training. However, it is restricted to the setting when the GAN converges to a Nash equilibrium. But GANs need not always converge to a Nash equilibrium to model the data distribution. In this work, we extend the notion of duality gap to proximal duality gap that is applicable to the general context of training GANs where Nash equilibria may not exist. We show theoretically that the proximal duality gap is capable of monitoring the convergence of GANs to a wider spectrum of equilibria that subsumes Nash equilibria. We also theoretically establish the relationship between the proximal duality gap and the divergence between the real and generated data distributions for different GAN formulations. Our results provide new insights into the nature of GAN convergence. Finally, we validate experimentally the usefulness of proximal duality gap for monitoring and influencing GAN training.
Uncover Common Facial Expressions in Terracotta Warriors: A Deep Learning Approach
Authors: Wenhong Tian, Yuanlun Xie, Tingsong Ma, Hengxin Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2105.04826
Pdf link: https://arxiv.org/pdf/2105.04826
Abstract Can advanced deep learning technologies be applied to analyze some ancient humanistic arts? Can deep learning technologies be directly applied to special scenes such as facial expression analysis of Terracotta Warriors? The big challenging is that the facial features of the Terracotta Warriors are very different from today's people. We found that it is very poor to directly use the models that have been trained on other classic facial expression datasets to analyze the facial expressions of the Terracotta Warriors. At the same time, the lack of public high-quality facial expression data of the Terracotta Warriors also limits the use of deep learning technologies. Therefore, we firstly use Generative Adversarial Networks (GANs) to generate enough high-quality facial expression data for subsequent training and recognition. We also verify the effectiveness of this approach. For the first time, this paper uses deep learning technologies to find common facial expressions of general and postured Terracotta Warriors. These results will provide an updated technical means for the research of art of the Terracotta Warriors and shine lights on the research of other ancient arts.
Improving Adversarial Transferability with Gradient Refining
Authors: Guoqiu Wang, Huanqian Yan, Ying Guo, Xingxing Wei
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2105.04834
Pdf link: https://arxiv.org/pdf/2105.04834
Abstract Deep neural networks are vulnerable to adversarial examples, which are crafted by adding human-imperceptible perturbations to original images. Most existing adversarial attack methods achieve nearly 100% attack success rates under the white-box setting, but only achieve relatively low attack success rates under the black-box setting. To improve the transferability of adversarial examples for the black-box setting, several methods have been proposed, e.g., input diversity, translation-invariant attack, and momentum-based attack. In this paper, we propose a method named Gradient Refining, which can further improve the adversarial transferability by correcting useless gradients introduced by input diversity through multiple transformations. Our method is generally applicable to many gradient-based attack methods combined with input diversity. Extensive experiments are conducted on the ImageNet dataset and our method can achieve an average transfer success rate of 82.07% for three different models under single-model setting, which outperforms the other state-of-the-art methods by a large margin of 6.0% averagely. And we have applied the proposed method to the competition CVPR 2021 Unrestricted Adversarial Attacks on ImageNet organized by Alibaba and won the second place in attack success rates among 1558 teams.
One Shot Face Swapping on Megapixels
Authors: Yuhao Zhu, Qi Li, Jian Wang, Chengzhong Xu, Zhenan Sun
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2105.04932
Pdf link: https://arxiv.org/pdf/2105.04932
Abstract Face swapping has both positive applications such as entertainment, human-computer interaction, etc., and negative applications such as DeepFake threats to politics, economics, etc. Nevertheless, it is necessary to understand the scheme of advanced methods for high-quality face swapping and generate enough and representative face swapping images to train DeepFake detection algorithms. This paper proposes the first Megapixel level method for one shot Face Swapping (or MegaFS for short). Firstly, MegaFS organizes face representation hierarchically by the proposed Hierarchical Representation Face Encoder (HieRFE) in an extended latent space to maintain more facial details, rather than compressed representation in previous face swapping methods. Secondly, a carefully designed Face Transfer Module (FTM) is proposed to transfer the identity from a source image to the target by a non-linear trajectory without explicit feature disentanglement. Finally, the swapped faces can be synthesized by StyleGAN2 with the benefits of its training stability and powerful generative capability. Each part of MegaFS can be trained separately so the requirement of our model for GPU memory can be satisfied for megapixel face swapping. In summary, complete face representation, stable training, and limited memory usage are the three novel contributions to the success of our method. Extensive experiments demonstrate the superiority of MegaFS and the first megapixel level face swapping database is released for research on DeepFake detection and face image editing in the public domain. The dataset is at this link.
Let There be Light: Improved Traffic Surveillance via Detail Preserving Night-to-Day Transfer
Authors: Lan Fu, Hongkai Yu, Felix Juefei-Xu, Jinlong Li, Qing Guo, Song Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2105.05011
Pdf link: https://arxiv.org/pdf/2105.05011
Abstract In recent years, image and video surveillance have made considerable progresses to the Intelligent Transportation Systems (ITS) with the help of deep Convolutional Neural Networks (CNNs). As one of the state-of-the-art perception approaches, detecting the interested objects in each frame of video surveillance is widely desired by ITS. Currently, object detection shows remarkable efficiency and reliability in standard scenarios such as daytime scenes with favorable illumination conditions. However, in face of adverse conditions such as the nighttime, object detection loses its accuracy significantly. One of the main causes of the problem is the lack of sufficient annotated detection datasets of nighttime scenes. In this paper, we propose a framework to alleviate the accuracy decline when object detection is taken to adverse conditions by using image translation method. We propose to utilize style translation based StyleMix method to acquire pairs of day time image and nighttime image as training data for following nighttime to daytime image translation. To alleviate the detail corruptions caused by Generative Adversarial Networks (GANs), we propose to utilize Kernel Prediction Network (KPN) based method to refine the nighttime to daytime image translation. The KPN network is trained with object detection task together to adapt the trained daytime model to nighttime vehicle detection directly. Experiments on vehicle detection verified the accuracy and effectiveness of the proposed approach.
Towards transparency in NLP shared tasks
Authors: Carla Parra Escartín, Teresa Lynn, Joss Moorkens, Jane Dunne
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2105.05020
Pdf link: https://arxiv.org/pdf/2105.05020
Abstract This article reports on a survey carried out across the Natural Language Processing (NLP) community. The survey aimed to capture the opinions of the research community on issues surrounding shared tasks, with respect to both participation and organisation. Amongst the 175 responses received, both positive and negative observations were made. We carried out and report on an extensive analysis of these responses, which leads us to propose a Shared Task Organisation Checklist that could support future participants and organisers. The proposed Checklist is flexible enough to accommodate the wide diversity of shared tasks in our field and its goal is not to be prescriptive, but rather to serve as a tool that encourages shared task organisers to foreground ethical behaviour, beginning with the common issues that the 175 respondents deemed important. Its usage would not only serve as an instrument to reflect on important aspects of shared tasks, but would also promote increased transparency around them.
ChaLearn LAP Large Scale Signer Independent Isolated Sign Language Recognition Challenge: Design, Results and Future Research
Authors: Ozge Mercanoglu Sincan, Julio C. S. Jacques Junior, Sergio Escalera, Hacer Yalim Keles
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2105.05066
Pdf link: https://arxiv.org/pdf/2105.05066
Abstract The performances of Sign Language Recognition (SLR) systems have improved considerably in recent years. However, several open challenges still need to be solved to allow SLR to be useful in practice. The research in the field is in its infancy in regards to the robustness of the models to a large diversity of signs and signers, and to fairness of the models to performers from different demographics. This work summarises the ChaLearn LAP Large Scale Signer Independent Isolated SLR Challenge, organised at CVPR 2021 with the goal of overcoming some of the aforementioned challenges. We analyse and discuss the challenge design, top winning solutions and suggestions for future research. The challenge attracted 132 participants in the RGB track and 59 in the RGB+Depth track, receiving more than 1.5K submissions in total. Participants were evaluated using a new large-scale multi-modal Turkish Sign Language (AUTSL) dataset, consisting of 226 sign labels and 36,302 isolated sign video samples performed by 43 different signers. Winning teams achieved more than 96% recognition rate, and their approaches benefited from pose/hand/face estimation, transfer learning, external data, fusion/ensemble of modalities and different strategies to model spatio-temporal information. However, methods still fail to distinguish among very similar signs, in particular those sharing similar hand trajectories.
Mandating Code Disclosure is Unnecessary -- Strict Model Verification Does Not Require Accessing Original Computer Code
Authors: Sasanka Sekhar Chanda
Subjects: Software Engineering (cs.SE); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2105.05170
Pdf link: https://arxiv.org/pdf/2105.05170
Abstract Mandating public availability of computer code underlying computational simulation modeling research ends up doing a disservice to the cause of model verification when inconsistencies between the specifications in the publication text and specifications in the computer code go unchallenged. Conversely, a model is verified when an independent researcher undertakes the set of mental processing tasks necessary to convert natural language specifications in a publication text into computer code instructions that produce numerical or graphical outputs identical to the outputs found in the original publication. The effort towards obtaining convergence with the numerical or graphical outputs directs intensive consideration of the publication text. The original computer code has little role to play in determining the verification status - verified/ failed verification. An insight is obtained that skillful deployment of human intelligence is feasible when effort-directing feedback processes are in place to appropriately go around the human frailty of giving up in the absence of actionable feedback. This principle can be put to use to develop better organizational configurations in business, government and society.
Including Signed Languages in Natural Language Processing
Authors: Kayo Yin, Amit Moryossef, Julie Hochgesang, Yoav Goldberg, Malihe Alikhani
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2105.05222
Pdf link: https://arxiv.org/pdf/2105.05222
Abstract Signed languages are the primary means of communication for many deaf and hard of hearing individuals. Since signed languages exhibit all the fundamental linguistic properties of natural language, we believe that tools and theories of Natural Language Processing (NLP) are crucial towards its modeling. However, existing research in Sign Language Processing (SLP) seldom attempt to explore and leverage the linguistic organization of signed languages. This position paper calls on the NLP community to include signed languages as a research area with high social and scientific impact. We first discuss the linguistic properties of signed languages to consider during their modeling. Then, we review the limitations of current SLP models and identify the open challenges to extend NLP to signed languages. Finally, we urge (1) the adoption of an efficient tokenization method; (2) the development of linguistically-informed models; (3) the collection of real-world signed language data; (4) the inclusion of local signed language communities as an active and leading voice in the direction of research.
Diffusion Models Beat GANs on Image Synthesis
Authors: Prafulla Dhariwal, Alex Nichol
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2105.05233
Pdf link: https://arxiv.org/pdf/2105.05233
Abstract We show that diffusion models can achieve image sample quality superior to the current state-of-the-art generative models. We achieve this on unconditional image synthesis by finding a better architecture through a series of ablations. For conditional image synthesis, we further improve sample quality with classifier guidance: a simple, compute-efficient method for trading off diversity for sample quality using gradients from a classifier. We achieve an FID of 2.97 on ImageNet $128 \times 128$, 4.59 on ImageNet $256 \times 256$, and $7.72$ on ImageNet $512 \times 512$, and we match BigGAN-deep even with as few as 25 forward passes per sample, all while maintaining better coverage of the distribution. Finally, we find that classifier guidance combines well with upsampling diffusion models, further improving FID to 3.85 on ImageNet $512 \times 512$. We release our code at https://github.com/openai/guided-diffusion
Keyword: flow

SUrgical PRediction GAN for Events Anticipation
Authors: Yutong Ban, Guy Rosman, Thomas Ward, Daniel Hashimoto, Taisei Kondo, Hidekazu Iwaki, Ozanan Meireles, Daniela Rus
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2105.04642
Pdf link: https://arxiv.org/pdf/2105.04642
Abstract Comprehension of surgical workflow is the foundation upon which computers build the understanding of surgery. In this work, we moved beyond just the identification of surgical phases to predict future surgical phases and the transitions between them. We used a novel GAN formulation that sampled the future surgical phases trajectory conditioned, on past laparoscopic video frames, and compared it to state-of-the-art approaches for surgical video analysis and alternative prediction methods. We demonstrated its effectiveness in inferring and predicting the progress of laparoscopic cholecystectomy videos. We quantified the horizon-accuracy trade-off and explored average performance as well as the performance on the more difficult, and clinically important, transitions between phases. Lastly, we surveyed surgeons to evaluate the plausibility of these predicted trajectories.
Distributed In-memory Data Management for Workflow Executions
Authors: Renan Souza, Vítor Silva, Alexandre A. B. Lima, Daniel de Oliveira, Patrick Valduriez, Marta Mattoso
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2105.04720
Pdf link: https://arxiv.org/pdf/2105.04720
Abstract Complex scientific experiments from various domains are typically modeled as workflows and executed on large-scale machines using a Parallel Workflow Management System (WMS). Since such executions usually last for hours or days, some WMSs provide user steering support, i.e., they allow users to run data analyses and, depending on the results, adapt the workflows at runtime. A challenge in the parallel execution control design is to manage workflow data for efficient executions while enabling user steering support. Data access for high scalability is typically transaction-oriented, while for data analysis, it is online analytical-oriented so that managing such hybrid workloads makes the challenge even harder. In this work, we present SchalaDB, an architecture with a set of design principles and techniques based on distributed in-memory data management for efficient workflow execution control and user steering. We propose a distributed data design for scalable workflow task scheduling and high availability driven by a parallel and distributed in-memory DBMS. To evaluate our proposal, we develop d-Chiron, a WMS designed according to SchalaDB's principles. We carry out an extensive experimental evaluation on an HPC cluster with up to 960 computing cores. Among other analyses, we show that even when running data analyses for user steering, SchalaDB's overhead is negligible for workloads composed of hundreds of concurrent tasks on shared data. Our results encourage workflow engine developers to follow a parallel and distributed data-oriented approach not only for scheduling and monitoring but also for user steering.
Graph Theory for Metro Traffic Modelling
Authors: Bruno Scalzo Dees, Yao Lei Xu, Anthony G. Constantinides, Danilo P. Mandic
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2105.04991
Pdf link: https://arxiv.org/pdf/2105.04991
Abstract A unifying graph theoretic framework for the modelling of metro transportation networks is proposed. This is achieved by first introducing a basic graph framework for the modelling of the London underground system from a diffusion law point of view. This forms a basis for the analysis of both station importance and their vulnerability, whereby the concept of graph vertex centrality plays a key role. We next explore k-edge augmentation of a graph topology, and illustrate its usefulness both for improving the network robustness and as a planning tool. Upon establishing the graph theoretic attributes of the underlying graph topology, we proceed to introduce models for processing data on such a metro graph. Commuter movement is shown to obey the Fick's law of diffusion, where the graph Laplacian provides an analytical model for the diffusion process of commuter population dynamics. Finally, we also explore the application of modern deep learning models, such as graph neural networks and hyper-graph neural networks, as general purpose models for the modelling and forecasting of underground data, especially in the context of the morning and evening rush hours. Comprehensive simulations including the passenger in- and out-flows during the morning rush hour in London demonstrates the advantages of the graph models in metro planning and traffic management, a formal mathematical approach with wide economic implications.
NF-iSAM: Incremental Smoothing and Mapping via Normalizing Flows
Authors: Qiangqiang Huang, Can Pu, Dehann Fourie, Kasra Khosoussi, Jonathan P. How, John J. Leonard
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2105.05045
Pdf link: https://arxiv.org/pdf/2105.05045
Abstract This paper presents a novel non-Gaussian inference algorithm, Normalizing Flow iSAM (NF-iSAM), for solving SLAM problems with non-Gaussian factors and/or non-linear measurement models. NF-iSAM exploits the expressive power of neural networks, and trains normalizing flows to draw samples from the joint posterior of non-Gaussian factor graphs. By leveraging the Bayes tree, NF-iSAM is able to exploit the sparsity structure of SLAM, thus enabling efficient incremental updates similar to iSAM2, albeit in the more challenging non-Gaussian setting. We demonstrate the performance of NF-iSAM and compare it against the state-of-the-art algorithms such as iSAM2 (Gaussian) and mm-iSAM (non-Gaussian) in synthetic and real range-only SLAM datasets.
Keyword: inpainting

There is no result

Keyword: transformer

Language Acquisition is Embodied, Interactive, Emotive: a Research Proposal
Authors: Casey Kennington
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2105.04633
Pdf link: https://arxiv.org/pdf/2105.04633
Abstract Humans' experience of the world is profoundly multimodal from the beginning, so why do existing state-of-the-art language models only use text as a modality to learn and represent semantic meaning? In this paper we review the literature on the role of embodiment and emotion in the interactive setting of spoken dialogue as necessary prerequisites for language learning for human children, including how words in child vocabularies are largely concrete, then shift to become more abstract as the children get older. We sketch a model of semantics that leverages current transformer-based models and a word-level grounded model, then explain the robot-dialogue system that will make use of our semantic model, the setting for the system to learn language, and existing benchmarks for evaluation.
R2D2: Relational Text Decoding with Transformers
Authors: Aryan Arbabi, Mingqiu Wang, Laurent El Shafey, Nan Du, Izhak Shafran
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2105.04645
Pdf link: https://arxiv.org/pdf/2105.04645
Abstract We propose a novel framework for modeling the interaction between graphical structures and the natural language text associated with their nodes and edges. Existing approaches typically fall into two categories. On group ignores the relational structure by converting them into linear sequences and then utilize the highly successful Seq2Seq models. The other side ignores the sequential nature of the text by representing them as fixed-dimensional vectors and apply graph neural networks. Both simplifications lead to information loss. Our proposed method utilizes both the graphical structure as well as the sequential nature of the texts. The input to our model is a set of text segments associated with the nodes and edges of the graph, which are then processed with a transformer encoder-decoder model, equipped with a self-attention mechanism that is aware of the graphical relations between the nodes containing the segments. This also allows us to use BERT-like models that are already trained on large amounts of text. While the proposed model has wide applications, we demonstrate its capabilities on data-to-text generation tasks. Our approach compares favorably against state-of-the-art methods in four tasks without tailoring the model architecture. We also provide an early demonstration in a novel practical application -- generating clinical notes from the medical entities mentioned during clinical visits.
Assessing the Syntactic Capabilities of Transformer-based Multilingual Language Models
Authors: Laura Pérez-Mayos, Alba Táboas García, Simon Mille, Leo Wanner
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2105.04688
Pdf link: https://arxiv.org/pdf/2105.04688
Abstract Multilingual Transformer-based language models, usually pretrained on more than 100 languages, have been shown to achieve outstanding results in a wide range of cross-lingual transfer tasks. However, it remains unknown whether the optimization for different languages conditions the capacity of the models to generalize over syntactic structures, and how languages with syntactic phenomena of different complexity are affected. In this work, we explore the syntactic generalization capabilities of the monolingual and multilingual versions of BERT and RoBERTa. More specifically, we evaluate the syntactic generalization potential of the models on English and Spanish tests, comparing the syntactic abilities of monolingual and multilingual models on the same language (English), and of multilingual models on two different languages (English and Spanish). For English, we use the available SyntaxGym test suite; for Spanish, we introduce SyntaxGymES, a novel ensemble of targeted syntactic tests in Spanish, designed to evaluate the syntactic generalization capabilities of language models through the SyntaxGym online platform.
EL-Attention: Memory Efficient Lossless Attention for Generation
Authors: Yu Yan, Jiusheng Chen, Weizhen Qi, Nikhil Bhendawade, Yeyun Gong, Nan Duan, Ruofei Zhang
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2105.04779
Pdf link: https://arxiv.org/pdf/2105.04779
Abstract Transformer model with multi-head attention requires caching intermediate results for efficient inference in generation tasks. However, cache brings new memory-related costs and prevents leveraging larger batch size for faster speed. We propose memory-efficient lossless attention (called EL-attention) to address this issue. It avoids heavy operations for building multi-head keys and values, with no requirements of using cache. EL-attention constructs an ensemble of attention results by expanding query while keeping key and value shared. It produces the same result as multi-head attention with less GPU memory and faster inference speed. We conduct extensive experiments on Transformer, BART, and GPT-2 for summarization and question generation tasks. The results show EL-attention speeds up existing models by 1.6x to 5.3x without accuracy loss.
Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech Translation
Authors: Shun-Po Chuang, Yung-Sung Chuang, Chih-Chiang Chang, Hung-yi Lee
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2105.04840
Pdf link: https://arxiv.org/pdf/2105.04840
Abstract We study the possibilities of building a non-autoregressive speech-to-text translation model using connectionist temporal classification (CTC), and use CTC-based automatic speech recognition as an auxiliary task to improve the performance. CTC's success on translation is counter-intuitive due to its monotonicity assumption, so we analyze its reordering capability. Kendall's tau distance is introduced as the quantitative metric, and gradient-based visualization provides an intuitive way to take a closer look into the model. Our analysis shows that transformer encoders have the ability to change the word order and points out the future research direction that worth being explored more on non-autoregressive speech translation.
Benchmarking down-scaled (not so large) pre-trained language models
Authors: M. Aßenmacher, P. Schulze, C. Heumann
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2105.04876
Pdf link: https://arxiv.org/pdf/2105.04876
Abstract Large Transformer-based language models are pre-trained on corpora of varying sizes, for a different number of steps and with different batch sizes. At the same time, more fundamental components, such as the pre-training objective or architectural hyperparameters, are modified. In total, it is therefore difficult to ascribe changes in performance to specific factors. Since searching the hyperparameter space over the full systems is too costly, we pre-train down-scaled versions of several popular Transformer-based architectures on a common pre-training corpus and benchmark them on a subset of the GLUE tasks (Wang et al., 2018). Specifically, we systematically compare three pre-training objectives for different shape parameters and model sizes, while also varying the number of pre-training steps and the batch size. In our experiments MLM + NSP (BERT-style) consistently outperforms MLM (RoBERTa-style) as well as the standard LM objective. Furthermore, we find that additional compute should be mainly allocated to an increased model size, while training for more steps is inefficient. Based on these observations, as a final step we attempt to scale up several systems using compound scaling (Tan and Le, 2019) adapted to Transformer-based language models.
Hierarchical RNNs-Based Transformers MADDPG for Mixed Cooperative-Competitive Environments
Authors: Xiaolong Wei, LiFang Yang, Xianglin Huang, Gang Cao, Tao Zhulin, Zhengyang Du, Jing An
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2105.04888
Pdf link: https://arxiv.org/pdf/2105.04888
Abstract At present, attention mechanism has been widely applied to the fields of deep learning models. Structural models that based on attention mechanism can not only record the relationships between features position, but also can measure the importance of different features based on their weights. By establishing dynamically weighted parameters for choosing relevant and irrelevant features, the key information can be strengthened, and the irrelevant information can be weakened. Therefore, the efficiency of deep learning algorithms can be significantly elevated and improved. Although transformers have been performed very well in many fields including reinforcement learning, there are still many problems and applications can be solved and made with transformers within this area. MARL (known as Multi-Agent Reinforcement Learning) can be recognized as a set of independent agents trying to adapt and learn through their way to reach the goal. In order to emphasize the relationship between each MDP decision in a certain time period, we applied the hierarchical coding method and validated the effectiveness of this method. This paper proposed a hierarchical transformers MADDPG based on RNN which we call it Hierarchical RNNs-Based Transformers MADDPG(HRTMADDPG). It consists of a lower level encoder based on RNNs that encodes multiple step sizes in each time sequence, and it also consists of an upper sequence level encoder based on transformer for learning the correlations between multiple sequences so that we can capture the causal relationship between sub-time sequences and make HRTMADDPG more efficient.
Role of Artificial Intelligence in Detection of Hateful Speech for Hinglish Data on Social Media
Authors: Ananya Srivastava, Mohammed Hasan, Bhargav Yagnik, Rahee Walambe, Ketan Kotecha
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2105.04913
Pdf link: https://arxiv.org/pdf/2105.04913
Abstract Social networking platforms provide a conduit to disseminate our ideas, views and thoughts and proliferate information. This has led to the amalgamation of English with natively spoken languages. Prevalence of Hindi-English code-mixed data (Hinglish) is on the rise with most of the urban population all over the world. Hate speech detection algorithms deployed by most social networking platforms are unable to filter out offensive and abusive content posted in these code-mixed languages. Thus, the worldwide hate speech detection rate of around 44% drops even more considering the content in Indian colloquial languages and slangs. In this paper, we propose a methodology for efficient detection of unstructured code-mix Hinglish language. Fine-tuning based approaches for Hindi-English code-mixed language are employed by utilizing contextual based embeddings such as ELMo (Embeddings for Language Models), FLAIR, and transformer-based BERT (Bidirectional Encoder Representations from Transformers). Our proposed approach is compared against the pre-existing methods and results are compared for various datasets. Our model outperforms the other methods and frameworks.
BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?
Authors: Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, Jose Camacho-Collados
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2105.04949
Pdf link: https://arxiv.org/pdf/2105.04949
Abstract Analogies play a central role in human commonsense reasoning. The ability to recognize analogies such as eye is to seeing what ear is to hearing, sometimes referred to as analogical proportions, shape how we structure knowledge and understand language. Surprisingly, however, the task of identifying such analogies has not yet received much attention in the language model era. In this paper, we analyze the capabilities of transformer-based language models on this unsupervised task, using benchmarks obtained from educational settings, as well as more commonly used datasets. We find that off-the-shelf language models can identify analogies to a certain extent, but struggle with abstract and complex relations, and results are highly sensitive to model architecture and hyperparameters. Overall the best results were obtained with GPT-2 and RoBERTa, while configurations using BERT were not able to outperform word embedding models. Our results raise important questions for future work about how, and to what extent, pre-trained language models capture knowledge about abstract semantic relations\footnote{Source code and data to reproduce our experimental results are available in the following repository: \url{https://github.com/asahi417/analogy-language-model}}.
Keyword: attention

Automatic Classification of Human Translation and Machine Translation: A Study from the Perspective of Lexical Diversity
Authors: Yingxue Fu, Mark-Jan Nederhof
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2105.04616
Pdf link: https://arxiv.org/pdf/2105.04616
Abstract By using a trigram model and fine-tuning a pretrained BERT model for sequence classification, we show that machine translation and human translation can be classified with an accuracy above chance level, which suggests that machine translation and human translation are different in a systematic way. The classification accuracy of machine translation is much higher than of human translation. We show that this may be explained by the difference in lexical diversity between machine translation and human translation. If machine translation has independent patterns from human translation, automatic metrics which measure the deviation of machine translation from human translation may conflate difference with quality. Our experiment with two different types of automatic metrics shows correlation with the result of the classification task. Therefore, we suggest the difference in lexical diversity between machine translation and human translation be given more attention in machine translation evaluation.
R2D2: Relational Text Decoding with Transformers
Authors: Aryan Arbabi, Mingqiu Wang, Laurent El Shafey, Nan Du, Izhak Shafran
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2105.04645
Pdf link: https://arxiv.org/pdf/2105.04645
Abstract We propose a novel framework for modeling the interaction between graphical structures and the natural language text associated with their nodes and edges. Existing approaches typically fall into two categories. On group ignores the relational structure by converting them into linear sequences and then utilize the highly successful Seq2Seq models. The other side ignores the sequential nature of the text by representing them as fixed-dimensional vectors and apply graph neural networks. Both simplifications lead to information loss. Our proposed method utilizes both the graphical structure as well as the sequential nature of the texts. The input to our model is a set of text segments associated with the nodes and edges of the graph, which are then processed with a transformer encoder-decoder model, equipped with a self-attention mechanism that is aware of the graphical relations between the nodes containing the segments. This also allows us to use BERT-like models that are already trained on large amounts of text. While the proposed model has wide applications, we demonstrate its capabilities on data-to-text generation tasks. Our approach compares favorably against state-of-the-art methods in four tasks without tailoring the model architecture. We also provide an early demonstration in a novel practical application -- generating clinical notes from the medical entities mentioned during clinical visits.
The Influence of Memory in Multi-Agent Consensus
Authors: David Kohan Marzagão, Luciana Basualdo Bonatto, Tiago Madeira, Marcelo Matheus Gauy, Peter McBurney
Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2105.04666
Pdf link: https://arxiv.org/pdf/2105.04666
Abstract Multi-agent consensus problems can often be seen as a sequence of autonomous and independent local choices between a finite set of decision options, with each local choice undertaken simultaneously, and with a shared goal of achieving a global consensus state. Being able to estimate probabilities for the different outcomes and to predict how long it takes for a consensus to be formed, if ever, are core issues for such protocols. Little attention has been given to protocols in which agents can remember past or outdated states. In this paper, we propose a framework to study what we call \emph{memory consensus protocol}. We show that the employment of memory allows such processes to always converge, as well as, in some scenarios, such as cycles, converge faster. We provide a theoretical analysis of the probability of each option eventually winning such processes based on the initial opinions expressed by agents. Further, we perform experiments to investigate network topologies in which agents benefit from memory on the expected time needed for consensus.
Incremental Graph Computation: Anchored Vertex Tracking in Dynamic Social Networks
Authors: Taotao Cai, Shuqiao Yang, Jianxin Li, Quan Z. Sheng, Jian Yang, Xin Wang, Wei Emma Zhang, Longxiang Gao
Subjects: Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2105.04742
Pdf link: https://arxiv.org/pdf/2105.04742
Abstract User engagement has recently received significant attention in understanding the decay and expansion of communities in many online social networking platforms. Many user engagement studies have done to find a set of critical (anchored) users in the static social network. However, the social network is highly dynamic and its structure is continuously evolving. In this paper, we target a new research problem called Anchored Vertex Tracking (AVT) that aims to track the anchored users at each timestamp of evolving networks. To solve the AVT problem, we develop a greedy algorithm. Furthermore, we design an incremental algorithm to efficiently solve the AVT problem. Finally, we conduct extensive experiments to demonstrate the performance of our proposed algorithms.
HAPS-ITS: Enabling Future ITS Services in Trans-Continental Highways
Authors: Wael Jaafar, Halim Yanikomeroglu
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2105.04756
Pdf link: https://arxiv.org/pdf/2105.04756
Abstract As the world we live in becomes smaller and more interconnected, with people and goods traveling for thousands of kilometers to reach their destinations, the reliability and efficiency of transportation systems have become critical. Indeed, trans-continental highways need particular attention due to their important role in sustaining globalization. In this context, intelligent transportation systems (ITS) can actively enhance the safety, mobility, productivity, and comfort of trans-continental highways. However, ITS efficiency depends greatly on the roads where they are deployed, on the availability of power and connectivity, and on the integration of future connected and autonomous vehicles. To this end, high altitude platform station (HAPS) systems, due to their mobility, sustainability, payload capacity, and communication/caching/computing capabilities, are seen as a key enabler of future ITS services for trans-continental highways; this paradigm is referred to as HAPS-ITS. The latter is envisioned as an active component of ITS systems to support a plethora of transportation applications, such as traffic monitoring, accident reporting, and platooning. This paper discusses how HAPS systems can enable advanced ITS services for trans-continental highways, presenting the main requirements of HAPS-ITS and a detailed case study of the Trans-Sahara highway.
EL-Attention: Memory Efficient Lossless Attention for Generation
Authors: Yu Yan, Jiusheng Chen, Weizhen Qi, Nikhil Bhendawade, Yeyun Gong, Nan Duan, Ruofei Zhang
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2105.04779
Pdf link: https://arxiv.org/pdf/2105.04779
Abstract Transformer model with multi-head attention requires caching intermediate results for efficient inference in generation tasks. However, cache brings new memory-related costs and prevents leveraging larger batch size for faster speed. We propose memory-efficient lossless attention (called EL-attention) to address this issue. It avoids heavy operations for building multi-head keys and values, with no requirements of using cache. EL-attention constructs an ensemble of attention results by expanding query while keeping key and value shared. It produces the same result as multi-head attention with less GPU memory and faster inference speed. We conduct extensive experiments on Transformer, BART, and GPT-2 for summarization and question generation tasks. The results show EL-attention speeds up existing models by 1.6x to 5.3x without accuracy loss.
Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules
Authors: Aisha Urooj Khan, Hilde Kuehne, Kevin Duarte, Chuang Gan, Niels Lobo, Mubarak Shah
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2105.04836
Pdf link: https://arxiv.org/pdf/2105.04836
Abstract The problem of grounding VQA tasks has seen an increased attention in the research community recently, with most attempts usually focusing on solving this task by using pretrained object detectors. However, pre-trained object detectors require bounding box annotations for detecting relevant objects in the vocabulary, which may not always be feasible for real-life large-scale applications. In this paper, we focus on a more relaxed setting: the grounding of relevant visual entities in a weakly supervised manner by training on the VQA task alone. To address this problem, we propose a visual capsule module with a query-based selection mechanism of capsule features, that allows the model to focus on relevant regions based on the textual cues about visual information in the question. We show that integrating the proposed capsule module in existing VQA systems significantly improves their performance on the weakly supervised grounding task. Overall, we demonstrate the effectiveness of our approach on two state-of-the-art VQA systems, stacked NMN and MAC, on the CLEVR-Answers benchmark, our new evaluation set based on CLEVR scenes with ground truth bounding boxes for objects that are relevant for the correct answer, as well as on GQA, a real world VQA dataset with compositional questions. We show that the systems with the proposed capsule module consistently outperform the respective baseline systems in terms of answer grounding, while achieving comparable performance on VQA task.
EDPN: Enhanced Deep Pyramid Network for Blurry Image Restoration
Authors: Ruikang Xu, Zeyu Xiao, Jie Huang, Yueyi Zhang, Zhiwei Xiong
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2105.04872
Pdf link: https://arxiv.org/pdf/2105.04872
Abstract Image deblurring has seen a great improvement with the development of deep neural networks. In practice, however, blurry images often suffer from additional degradations such as downscaling and compression. To address these challenges, we propose an Enhanced Deep Pyramid Network (EDPN) for blurry image restoration from multiple degradations, by fully exploiting the self- and cross-scale similarities in the degraded image.Specifically, we design two pyramid-based modules, i.e., the pyramid progressive transfer (PPT) module and the pyramid self-attention (PSA) module, as the main components of the proposed network. By taking several replicated blurry images as inputs, the PPT module transfers both self- and cross-scale similarity information from the same degraded image in a progressive manner. Then, the PSA module fuses the above transferred features for subsequent restoration using self- and spatial-attention mechanisms. Experimental results demonstrate that our method significantly outperforms existing solutions for blurry image super-resolution and blurry image deblocking. In the NTIRE 2021 Image Deblurring Challenge, EDPN achieves the best PSNR/SSIM/LPIPS scores in Track 1 (Low Resolution) and the best SSIM/LPIPS scores in Track 2 (JPEG Artifacts).
Consistent Multiple Graph Embedding for Multi-View Clustering
Authors: Yiming Wang, Dongxia Chang, Zhiqiang Fu, Yao Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2105.04880
Pdf link: https://arxiv.org/pdf/2105.04880
Abstract Graph-based multi-view clustering aiming to obtain a partition of data across multiple views, has received considerable attention in recent years. Although great efforts have been made for graph-based multi-view clustering, it remains a challenge to fuse characteristics from various views to learn a common representation for clustering. In this paper, we propose a novel Consistent Multiple Graph Embedding Clustering framework(CMGEC). Specifically, a multiple graph auto-encoder(M-GAE) is designed to flexibly encode the complementary information of multi-view data using a multi-graph attention fusion encoder. To guide the learned common representation maintaining the similarity of the neighboring characteristics in each view, a Multi-view Mutual Information Maximization module(MMIM) is introduced. Furthermore, a graph fusion network(GFN) is devised to explore the relationship among graphs from different views and provide a common consensus graph needed in M-GAE. By jointly training these models, the common latent representation can be obtained which encodes more complementary information from multiple views and depicts data more comprehensively. Experiments on three types of multi-view datasets demonstrate CMGEC outperforms the state-of-the-art clustering methods.
Operation Embeddings for Neural Architecture Search
Authors: Michail Chatzianastasis, George Dasoulas, Georgios Siolas, Michalis Vazirgiannis
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2105.04885
Pdf link: https://arxiv.org/pdf/2105.04885
Abstract Neural Architecture Search (NAS) has recently gained increased attention, as a class of approaches that automatically searches in an input space of network architectures. A crucial part of the NAS pipeline is the encoding of the architecture that consists of the applied computational blocks, namely the operations and the links between them. Most of the existing approaches either fail to capture the structural properties of the architectures or use a hand-engineered vector to encode the operator information. In this paper, we propose the replacement of fixed operator encoding with learnable representations in the optimization process. This approach, which effectively captures the relations of different operations, leads to smoother and more accurate representations of the architectures and consequently to improved performance of the end task. Our extensive evaluation in ENAS benchmark demonstrates the effectiveness of the proposed operation embeddings to the generation of highly accurate models, achieving state-of-the-art performance. Finally, our method produces top-performing architectures that share similar operation and graph patterns, highlighting a strong correlation between architecture's structural properties and performance.
Hierarchical RNNs-Based Transformers MADDPG for Mixed Cooperative-Competitive Environments
Authors: Xiaolong Wei, LiFang Yang, Xianglin Huang, Gang Cao, Tao Zhulin, Zhengyang Du, Jing An
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2105.04888
Pdf link: https://arxiv.org/pdf/2105.04888
Abstract At present, attention mechanism has been widely applied to the fields of deep learning models. Structural models that based on attention mechanism can not only record the relationships between features position, but also can measure the importance of different features based on their weights. By establishing dynamically weighted parameters for choosing relevant and irrelevant features, the key information can be strengthened, and the irrelevant information can be weakened. Therefore, the efficiency of deep learning algorithms can be significantly elevated and improved. Although transformers have been performed very well in many fields including reinforcement learning, there are still many problems and applications can be solved and made with transformers within this area. MARL (known as Multi-Agent Reinforcement Learning) can be recognized as a set of independent agents trying to adapt and learn through their way to reach the goal. In order to emphasize the relationship between each MDP decision in a certain time period, we applied the hierarchical coding method and validated the effectiveness of this method. This paper proposed a hierarchical transformers MADDPG based on RNN which we call it Hierarchical RNNs-Based Transformers MADDPG(HRTMADDPG). It consists of a lower level encoder based on RNNs that encodes multiple step sizes in each time sequence, and it also consists of an upper sequence level encoder based on transformer for learning the correlations between multiple sequences so that we can capture the causal relationship between sub-time sequences and make HRTMADDPG more efficient.
A Euclidean Distance Matrix Model for Convex Clustering
Authors: Zhaowei Wang, Xiaowen Liu, Qingna Li
Subjects: Discrete Mathematics (cs.DM)
Arxiv link: https://arxiv.org/abs/2105.04947
Pdf link: https://arxiv.org/pdf/2105.04947
Abstract Clustering has been one of the most basic and essential problems in unsupervised learning due to various applications in many critical fields. The recently proposed sum-of-nums (SON) model by Pelckmans et al. (2005), Lindsten et al. (2011) and Hocking et al. (2011) has received a lot of attention. The advantage of the SON model is the theoretical guarantee in terms of perfect recovery, established by Sun et al. (2018). It also provides great opportunities for designing efficient algorithms for solving the SON model. The semismooth Newton based augmented Lagrangian method by Sun et al. (2018) has demonstrated its superior performance over the alternating direction method of multipliers (ADMM) and the alternating minimization algorithm (AMA). In this paper, we propose a Euclidean distance matrix model based on the SON model. An efficient majorization penalty algorithm is proposed to solve the resulting model. Extensive numerical experiments are conducted to demonstrate the efficiency of the proposed model and the majorization penalty algorithm.
BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?
Authors: Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert, Jose Camacho-Collados
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2105.04949
Pdf link: https://arxiv.org/pdf/2105.04949
Abstract Analogies play a central role in human commonsense reasoning. The ability to recognize analogies such as eye is to seeing what ear is to hearing, sometimes referred to as analogical proportions, shape how we structure knowledge and understand language. Surprisingly, however, the task of identifying such analogies has not yet received much attention in the language model era. In this paper, we analyze the capabilities of transformer-based language models on this unsupervised task, using benchmarks obtained from educational settings, as well as more commonly used datasets. We find that off-the-shelf language models can identify analogies to a certain extent, but struggle with abstract and complex relations, and results are highly sensitive to model architecture and hyperparameters. Overall the best results were obtained with GPT-2 and RoBERTa, while configurations using BERT were not able to outperform word embedding models. Our results raise important questions for future work about how, and to what extent, pre-trained language models capture knowledge about abstract semantic relations\footnote{Source code and data to reproduce our experimental results are available in the following repository: \url{https://github.com/asahi417/analogy-language-model}}.
Exploring a Handwriting Programming Language for Educational Robots
Authors: Laila El-Hamamsy, Vaios Papaspyros, Taavet Kangur, Laura Mathex, Christian Giang, Melissa Skweres, Barbara Bruno, Francesco Mondada
Subjects: Programming Languages (cs.PL); Computers and Society (cs.CY); Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2105.04963
Pdf link: https://arxiv.org/pdf/2105.04963
Abstract Recently, introducing computer science and educational robots in compulsory education has received increasing attention. However, the use of screens in classrooms is often met with resistance, especially in primary school. To address this issue, this study presents the development of a handwriting-based programming language for educational robots. Aiming to align better with existing classroom practices, it allows students to program a robot by drawing symbols with ordinary pens and paper. Regular smartphones are leveraged to process the hand-drawn instructions using computer vision and machine learning algorithms, and send the commands to the robot for execution. To align with the local computer science curriculum, an appropriate playground and scaffolded learning tasks were designed. The system was evaluated in a preliminary test with eight teachers, developers and educational researchers. While the participants pointed out that some technical aspects could be improved, they also acknowledged the potential of the approach to make computer science education in primary school more accessible.
Open Set Domain Recognition via Attention-Based\GCN and Semantic Matching Optimization
Authors: Xinxing He, Yuan Yuan, Zhiyu Jiang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2105.04967
Pdf link: https://arxiv.org/pdf/2105.04967
Abstract Open set domain recognition has got the attention in recent years. The task aims to specifically classify each sample in the practical unlabeled target domain, which consists of all known classes in the manually labeled source domain and target-specific unknown categories. The absence of annotated training data or auxiliary attribute information for unknown categories makes this task especially difficult. Moreover, exiting domain discrepancy in label space and data distribution further distracts the knowledge transferred from known classes to unknown classes. To address these issues, this work presents an end-to-end model based on attention-based GCN and semantic matching optimization, which first employs the attention mechanism to enable the central node to learn more discriminating representations from its neighbors in the knowledge graph. Moreover, a coarse-to-fine semantic matching optimization approach is proposed to progressively bridge the domain gap. Experimental results validate that the proposed model not only has superiority on recognizing the images of known and unknown classes, but also can adapt to various openness of the target domain.
Instance-aware Remote Sensing Image Captioning with Cross-hierarchy Attention
Authors: Chengze Wang, Zhiyu Jiang, Yuan Yuan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2105.04996
Pdf link: https://arxiv.org/pdf/2105.04996
Abstract The spatial attention is a straightforward approach to enhance the performance for remote sensing image captioning. However, conventional spatial attention approaches consider only the attention distribution on one fixed coarse grid, resulting in the semantics of tiny objects can be easily ignored or disturbed during the visual feature extraction. Worse still, the fixed semantic level of conventional spatial attention limits the image understanding in different levels and perspectives, which is critical for tackling the huge diversity in remote sensing images. To address these issues, we propose a remote sensing image caption generator with instance-awareness and cross-hierarchy attention. 1) The instances awareness is achieved by introducing a multi-level feature architecture that contains the visual information of multi-level instance-possible regions and their surroundings. 2) Moreover, based on this multi-level feature extraction, a cross-hierarchy attention mechanism is proposed to prompt the decoder to dynamically focus on different semantic hierarchies and instances at each time step. The experimental results on public datasets demonstrate the superiority of proposed approach over existing methods.
Counterfactual Explanations for Neural Recommenders
Authors: Khanh Hiep Tran, Azin Ghazimatin, Rishiraj Saha Roy
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2105.05008
Pdf link: https://arxiv.org/pdf/2105.05008
Abstract Understanding why specific items are recommended to users can significantly increase their trust and satisfaction in the system. While neural recommenders have become the state-of-the-art in recent years, the complexity of deep models still makes the generation of tangible explanations for end users a challenging problem. Existing methods are usually based on attention distributions over a variety of features, which are still questionable regarding their suitability as explanations, and rather unwieldy to grasp for an end user. Counterfactual explanations based on a small set of the user's own actions have been shown to be an acceptable solution to the tangibility problem. However, current work on such counterfactuals cannot be readily applied to neural models. In this work, we propose ACCENT, the first general framework for finding counterfactual explanations for neural recommenders. It extends recently-proposed influence functions for identifying training points most relevant to a recommendation, from a single to a pair of items, while deducing a counterfactual set in an iterative process. We use ACCENT to generate counterfactual explanations for two popular neural models, Neural Collaborative Filtering (NCF) and Relational Collaborative Filtering (RCF), and demonstrate its feasibility on a sample of the popular MovieLens 100K dataset.
Adversarial examples attack based on random warm restart mechanism and improved Nesterov momentum
Authors: Tiangang Li
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2105.05029
Pdf link: https://arxiv.org/pdf/2105.05029
Abstract The deep learning algorithm has achieved great success in the field of computer vision, but some studies have pointed out that the deep learning model is vulnerable to attacks adversarial examples and makes false decisions. This challenges the further development of deep learning, and urges researchers to pay more attention to the relationship between adversarial examples attacks and deep learning security. This work focuses on adversarial examples, optimizes the generation of adversarial examples from the view of adversarial robustness, takes the perturbations added in adversarial examples as the optimization parameter. We propose RWR-NM-PGD attack algorithm based on random warm restart mechanism and improved Nesterov momentum from the view of gradient optimization. The algorithm introduces improved Nesterov momentum, using its characteristics of accelerating convergence and improving gradient update direction in optimization algorithm to accelerate the generation of adversarial examples. In addition, the random warm restart mechanism is used for optimization, and the projected gradient descent algorithm is used to limit the range of the generated perturbations in each warm restart, which can obtain better attack effect. Experiments on two public datasets show that the algorithm proposed in this work can improve the success rate of attacking deep learning models without extra time cost. Compared with the benchmark attack method, the algorithm proposed in this work can achieve better attack success rate for both normal training model and defense model. Our method has average attack success rate of 46.3077%, which is 27.19% higher than I-FGSM and 9.27% higher than PGD. The attack results in 13 defense models show that the attack algorithm proposed in this work is superior to the benchmark algorithm in attack universality and transferability.
The Impact of Incomplete Information on Network Formation with Heterogeneous Agents
Authors: D. Kai Zhang, Alexander Carver
Subjects: Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2105.05072
Pdf link: https://arxiv.org/pdf/2105.05072
Abstract We propose an agent-based network formation model under uncertainty with the objective of relaxing the common assumption of complete information, calling attention to the role beliefs may play in segregation. We demonstrate that our model is capable of generating a set of networks that encompasses those of a complete information model. Further, we show that by allowing agents to be biased toward each other based on observable attributes, our model is able to generate homophilous equilibria with preferences that are indifferent to these attributes. We accompany our theoretical results with a simulation-based investigation of the relationship between beliefs and segregation and show that biased beliefs are an important driver of segregation under incomplete information.
kdehumor at semeval-2020 task 7: a neural network model for detecting funniness in dataset humicroedit
Authors: Rida Miraj, Masaki Aono
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2105.05135
Pdf link: https://arxiv.org/pdf/2105.05135
Abstract This paper describes our contribution to SemEval-2020 Task 7: Assessing Humor in Edited News Headlines. Here we present a method based on a deep neural network. In recent years, quite some attention has been devoted to humor production and perception. Our team KdeHumor employs recurrent neural network models including Bi-Directional LSTMs (BiLSTMs). Moreover, we utilize the state-of-the-art pre-trained sentence embedding techniques. We analyze the performance of our method and demonstrate the contribution of each component of our architecture.
Performance Comparison of Different Machine Learning Algorithms on the Prediction of Wind Turbine Power Generation
Authors: Onder Eyecioglu, Batuhan Hangun, Korhan Kayisli, Mehmet Yesilbudak
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2105.05197
Pdf link: https://arxiv.org/pdf/2105.05197
Abstract Over the past decade, wind energy has gained more attention in the world. However, owing to its indirectness and volatility properties, wind power penetration has increased the difficulty and complexity in dispatching and planning of electric power systems. Therefore, it is needed to make the high-precision wind power prediction in order to balance the electrical power. For this purpose, in this study, the prediction performance of linear regression, k-nearest neighbor regression and decision tree regression algorithms is compared in detail. k-nearest neighbor regression algorithm provides lower coefficient of determination values, while decision tree regression algorithm produces lower mean absolute error values. In addition, the meteorological parameters of wind speed, wind direction, barometric pressure and air temperature are evaluated in terms of their importance on the wind power parameter. The biggest importance factor is achieved by wind speed parameter. In consequence, many useful assessments are made for wind power predictions.
Performance-aware placement and chaining scheme for virtualized network functions: a particle swarm optimization approach
Authors: Samane Asgari, Shahram Jamali, Reza Fotohi, Mahdi Nooshyar
Subjects: Networking and Internet Architecture (cs.NI); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2105.05248
Pdf link: https://arxiv.org/pdf/2105.05248
Abstract Network functions virtualization (NFV) is a new concept that has received the attention of both researchers and network providers. NFV decouples network functions from specialized hardware devices and virtualizes these network functions as software instances called virtualized network functions (VNFs). NFV leads to various benefits, including more flexibility, high resource utilization, and easy upgrades and maintenances. Despite recent works in this field, placement and chaining of VNFs need more attention. More specifically, some of the existing works have considered only the placement of VNFs and ignored the chaining part. So, they have not provided an integrated view of host or bandwidth resources and propagation delay of paths. In this paper, we solve the VNF placement and chaining problem as an optimization problem based on the particle swarm optimization (PSO) algorithm. Our goal is to minimize the required number of used servers, the average propagation delay of paths, and the average utilization of links while meeting network demands and constraints. Based on the obtained results, the algorithm proposed in this study can find feasible and high-quality solutions.

dajinstory / daily-arxiv-noti

New submissions for Wed, 12 May 21 #102

Keyword: super resolution

Keyword: gan

Towards Discovery and Attribution of Open-world GAN Generated Images

SUrgical PRediction GAN for Events Anticipation

GroupLink: An End-to-end Multitask Method for Word Grouping and Relation Extraction in Form Understanding

A Value-driven Approach for Software Process Improvement -- A Solution Proposal

Scalable Personalised Item Ranking through Parametric Density Estimation

An Innovative Security Strategy using Reactive Web Application Honeypot

Characterizing GAN Convergence Through Proximal Duality Gap

Uncover Common Facial Expressions in Terracotta Warriors: A Deep Learning Approach

Improving Adversarial Transferability with Gradient Refining

One Shot Face Swapping on Megapixels

Let There be Light: Improved Traffic Surveillance via Detail Preserving Night-to-Day Transfer

Towards transparency in NLP shared tasks

ChaLearn LAP Large Scale Signer Independent Isolated Sign Language Recognition Challenge: Design, Results and Future Research

Mandating Code Disclosure is Unnecessary -- Strict Model Verification Does Not Require Accessing Original Computer Code

Including Signed Languages in Natural Language Processing

Diffusion Models Beat GANs on Image Synthesis

Keyword: flow

SUrgical PRediction GAN for Events Anticipation

Distributed In-memory Data Management for Workflow Executions

Graph Theory for Metro Traffic Modelling

NF-iSAM: Incremental Smoothing and Mapping via Normalizing Flows

Keyword: inpainting

Keyword: transformer

Language Acquisition is Embodied, Interactive, Emotive: a Research Proposal

R2D2: Relational Text Decoding with Transformers

Assessing the Syntactic Capabilities of Transformer-based Multilingual Language Models

EL-Attention: Memory Efficient Lossless Attention for Generation

Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech Translation

Benchmarking down-scaled (not so large) pre-trained language models

Hierarchical RNNs-Based Transformers MADDPG for Mixed Cooperative-Competitive Environments

Role of Artificial Intelligence in Detection of Hateful Speech for Hinglish Data on Social Media

BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?

Keyword: attention

Automatic Classification of Human Translation and Machine Translation: A Study from the Perspective of Lexical Diversity

R2D2: Relational Text Decoding with Transformers

The Influence of Memory in Multi-Agent Consensus

Incremental Graph Computation: Anchored Vertex Tracking in Dynamic Social Networks

HAPS-ITS: Enabling Future ITS Services in Trans-Continental Highways

EL-Attention: Memory Efficient Lossless Attention for Generation

Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules

EDPN: Enhanced Deep Pyramid Network for Blurry Image Restoration

Consistent Multiple Graph Embedding for Multi-View Clustering

Operation Embeddings for Neural Architecture Search

Hierarchical RNNs-Based Transformers MADDPG for Mixed Cooperative-Competitive Environments

A Euclidean Distance Matrix Model for Convex Clustering

BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?

Exploring a Handwriting Programming Language for Educational Robots

Open Set Domain Recognition via Attention-Based\GCN and Semantic Matching Optimization

Instance-aware Remote Sensing Image Captioning with Cross-hierarchy Attention

Counterfactual Explanations for Neural Recommenders

Adversarial examples attack based on random warm restart mechanism and improved Nesterov momentum

The Impact of Incomplete Information on Network Formation with Heterogeneous Agents

kdehumor at semeval-2020 task 7: a neural network model for detecting funniness in dataset humicroedit

Performance Comparison of Different Machine Learning Algorithms on the Prediction of Wind Turbine Power Generation

Performance-aware placement and chaining scheme for virtualized network functions: a particle swarm optimization approach