New submissions for Fri, 1 Jul 22

Keyword: detection

Neural Motion Fields: Encoding Grasp Trajectories as Implicit Value Functions

Authors: Authors: Yun-Chun Chen, Adithyavairavan Murali, Balakumar Sundaralingam, Wei Yang, Animesh Garg, Dieter Fox
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2206.14854
Pdf link: https://arxiv.org/pdf/2206.14854
Abstract The pipeline of current robotic pick-and-place methods typically consists of several stages: grasp pose detection, finding inverse kinematic solutions for the detected poses, planning a collision-free trajectory, and then executing the open-loop trajectory to the grasp pose with a low-level tracking controller. While these grasping methods have shown good performance on grasping static objects on a table-top, the problem of grasping dynamic objects in constrained environments remains an open problem. We present Neural Motion Fields, a novel object representation which encodes both object point clouds and the relative task trajectories as an implicit value function parameterized by a neural network. This object-centric representation models a continuous distribution over the SE(3) space and allows us to perform grasping reactively by leveraging sampling-based MPC to optimize this value function.
Machine Learning Approaches to Predict Breast Cancer: Bangladesh Perspective
Authors: Authors: Taminul Islam, Arindom Kundu, Nazmul Islam Khan, Choyon Chandra Bonik, Flora Akter, Md Jihadul Islam
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2206.14972
Pdf link: https://arxiv.org/pdf/2206.14972
Abstract Nowadays, Breast cancer has risen to become one of the most prominent causes of death in recent years. Among all malignancies, this is the most frequent and the major cause of death for women globally. Manually diagnosing this disease requires a good amount of time and expertise. Breast cancer detection is time-consuming, and the spread of the disease can be reduced by developing machine-based breast cancer predictions. In Machine learning, the system can learn from prior instances and find hard-to-detect patterns from noisy or complicated data sets using various statistical, probabilistic, and optimization approaches. This work compares several machine learning algorithm's classification accuracy, precision, sensitivity, and specificity on a newly collected dataset. In this work Decision tree, Random Forest, Logistic Regression, Naive Bayes, and XGBoost, these five machine learning approaches have been implemented to get the best performance on our dataset. This study focuses on finding the best algorithm that can forecast breast cancer with maximum accuracy in terms of its classes. This work evaluated the quality of each algorithm's data classification in terms of efficiency and effectiveness. And also compared with other published work on this domain. After implementing the model, this study achieved the best model accuracy, 94% on Random Forest and XGBoost.
Semi-Supervised Generative Adversarial Network for Stress Detection Using Partially Labeled Physiological Data
Authors: Authors: Nibraas Khan
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2206.14976
Pdf link: https://arxiv.org/pdf/2206.14976
Abstract Physiological measurements involves observing variables that attribute to the normative functioning of human systems and subsystems directly or indirectly. The measurements can be used to detect affective states of a person with aims such as improving human-computer interactions. There are several methods of collecting physiological data, but wearable sensors are a common, non-invasive tool for accurate readings. However, valuable information is hard to extract from the raw physiological data, especially for affective state detection. Machine Learning techniques are used to detect the affective state of a person through labeled physiological data. A clear problem with using labeled data is creating accurate labels. An expert is needed to analyze a form of recording of participants and mark sections with different states such as stress and calm. While expensive, this method delivers a complete dataset with labeled data that can be used in any number of supervised algorithms. An interesting question arises from the expensive labeling: how can we reduce the cost while maintaining high accuracy? Semi-Supervised learning (SSL) is a potential solution to this problem. These algorithms allow for machine learning models to be trained with only a small subset of labeled data (unlike unsupervised which use no labels). They provide a way of avoiding expensive labeling. This paper compares a fully supervised algorithm to a SSL on the public WESAD (Wearable Stress and Affect Detection) Dataset for stress detection. This paper shows that Semi-Supervised algorithms are a viable method for inexpensive affective state detection systems with accurate results.
Cross-domain Federated Object Detection
Authors: Authors: Shangchao Su, Bin Li, Chengzhi Zhang, Mingzhao Yang, Xiangyang Xue
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2206.14996
Pdf link: https://arxiv.org/pdf/2206.14996
Abstract Detection models trained by one party (server) may face severe performance degradation when distributed to other users (clients). For example, in autonomous driving scenarios, different driving environments may bring obvious domain shifts, which lead to biases in model predictions. Federated learning that has emerged in recent years can enable multi-party collaborative training without leaking client data. In this paper, we focus on a special cross-domain scenario where the server contains large-scale data and multiple clients only contain a small amount of data; meanwhile, there exist differences in data distributions among the clients. In this case, traditional federated learning techniques cannot take into account the learning of both the global knowledge of all participants and the personalized knowledge of a specific client. To make up for this limitation, we propose a cross-domain federated object detection framework, named FedOD. In order to learn both the global knowledge and the personalized knowledge in different domains, the proposed framework first performs the federated training to obtain a public global aggregated model through multi-teacher distillation, and sends the aggregated model back to each client for finetuning its personalized local model. After very few rounds of communication, on each client we can perform weighted ensemble inference on the public global model and the personalized local model. With the ensemble, the generalization performance of the client-side model can outperform a single model with the same parameter scale. We establish a federated object detection dataset which has significant background differences and instance differences based on multiple public autonomous driving datasets, and then conduct extensive experiments on the dataset. The experimental results validate the effectiveness of the proposed method.
Exploring Temporally Dynamic Data Augmentation for Video Recognition
Authors: Authors: Taeoh Kim, Jinhyung Kim, Minho Shim, Sangdoo Yun, Myunggu Kang, Dongyoon Wee, Sangyoun Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2206.15015
Pdf link: https://arxiv.org/pdf/2206.15015
Abstract Data augmentation has recently emerged as an essential component of modern training recipes for visual recognition tasks. However, data augmentation for video recognition has been rarely explored despite its effectiveness. Few existing augmentation recipes for video recognition naively extend the image augmentation methods by applying the same operations to the whole video frames. Our main idea is that the magnitude of augmentation operations for each frame needs to be changed over time to capture the real-world video's temporal variations. These variations should be generated as diverse as possible using fewer additional hyper-parameters during training. Through this motivation, we propose a simple yet effective video data augmentation framework, DynaAugment. The magnitude of augmentation operations on each frame is changed by an effective mechanism, Fourier Sampling that parameterizes diverse, smooth, and realistic temporal variations. DynaAugment also includes an extended search space suitable for video for automatic data augmentation methods. DynaAugment experimentally demonstrates that there are additional performance rooms to be improved from static augmentations on diverse video models. Specifically, we show the effectiveness of DynaAugment on various video datasets and tasks: large-scale video recognition (Kinetics-400 and Something-Something-v2), small-scale video recognition (UCF- 101 and HMDB-51), fine-grained video recognition (Diving-48 and FineGym), video action segmentation on Breakfast, video action localization on THUMOS'14, and video object detection on MOT17Det. DynaAugment also enables video models to learn more generalized representation to improve the model robustness on the corrupted videos.
Causality-Based Multivariate Time Series Anomaly Detection
Authors: Authors: Wenzhuo Yang, Kun Zhang, Steven C.H. Hoi
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2206.15033
Pdf link: https://arxiv.org/pdf/2206.15033
Abstract Anomaly detection in multivariate time series plays an important role in monitoring the behaviors of various real-world systems, e.g., IT system operations or manufacturing industry. Previous approaches model the joint distribution without considering the underlying mechanism of multivariate time series, making them complicated and computationally hungry. In this paper, we formulate the anomaly detection problem from a causal perspective and view anomalies as instances that do not follow the regular causal mechanism to generate the multivariate data. We then propose a causality-based anomaly detection approach, which first learns the causal structure from data and then infers whether an instance is an anomaly relative to the local causal mechanism to generate each variable from its direct causes, whose conditional distribution can be directly estimated from data. In light of the modularity property of causal systems, the original problem is divided into a series of separate low-dimensional anomaly detection problems so that where an anomaly happens can be directly identified. We evaluate our approach with both simulated and public datasets as well as a case study on real-world AIOps applications, showing its efficacy, robustness, and practical feasibility.
ZeroC: A Neuro-Symbolic Model for Zero-shot Concept Recognition and Acquisition at Inference Time
Authors: Authors: Tailin Wu, Megan Tjandrasuwita, Zhengxuan Wu, Xuelin Yang, Kevin Liu, Rok Sosič, Jure Leskovec
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2206.15049
Pdf link: https://arxiv.org/pdf/2206.15049
Abstract Humans have the remarkable ability to recognize and acquire novel visual concepts in a zero-shot manner. Given a high-level, symbolic description of a novel concept in terms of previously learned visual concepts and their relations, humans can recognize novel concepts without seeing any examples. Moreover, they can acquire new concepts by parsing and communicating symbolic structures using learned visual concepts and relations. Endowing these capabilities in machines is pivotal in improving their generalization capability at inference time. In this work, we introduce Zero-shot Concept Recognition and Acquisition (ZeroC), a neuro-symbolic architecture that can recognize and acquire novel concepts in a zero-shot way. ZeroC represents concepts as graphs of constituent concept models (as nodes) and their relations (as edges). To allow inference time composition, we employ energy-based models (EBMs) to model concepts and relations. We design ZeroC architecture so that it allows a one-to-one mapping between a symbolic graph structure of a concept and its corresponding EBM, which for the first time, allows acquiring new concepts, communicating its graph structure, and applying it to classification and detection tasks (even across domains) at inference time. We introduce algorithms for learning and inference with ZeroC. We evaluate ZeroC on a challenging grid-world dataset which is designed to probe zero-shot concept recognition and acquisition, and demonstrate its capability.
Learning-Based Near-Orthogonal Superposition Code for MIMO Short Message Transmission
Authors: Authors: Chenghong Bian, Chin-Wei Hsu, Changwoo Lee, Hun-Seok Kim
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2206.15065
Pdf link: https://arxiv.org/pdf/2206.15065
Abstract Massive machine type communication (mMTC) has attracted new coding schemes optimized for reliable short message transmission. In this paper, a novel deep learning-based near-orthogonal superposition (NOS) coding scheme is proposed to transmit short messages in multiple-input multiple-output (MIMO) channels for mMTC applications. In the proposed MIMO-NOS scheme, a neural network-based encoder is optimized via end-to-end learning with a corresponding neural network-based detector/decoder in a superposition-based auto-encoder framework including a MIMO channel. The proposed MIMO-NOS encoder spreads the information bits to multiple near-orthogonal high dimensional vectors to be combined (superimposed) into a single vector and reshaped for the space-time transmission. For the receiver, we propose a novel looped K-best tree-search algorithm with cyclic redundancy check (CRC) assistance to enhance the error correcting ability in the block-fading MIMO channel. Simulation results show the proposed MIMO-NOS scheme outperforms maximum likelihood (ML) MIMO detection combined with a polar code with CRC-assisted list decoding by 1-2 dB in various MIMO systems for short (32-64 bit) message transmission.
Learnable Model-Driven Performance Prediction and Optimization for Imperfect MIMO System: Framework and Application
Authors: Authors: Fan Meng, Shengheng Liu, Yongming Huang, Zhaohua Lu
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2206.15072
Pdf link: https://arxiv.org/pdf/2206.15072
Abstract State-of-the-art schemes for performance analysis and optimization of multiple-input multiple-output systems generally experience degradation or even become invalid in dynamic complex scenarios with unknown interference and channel state information (CSI) uncertainty. To adapt to the challenging settings and better accomplish these network auto-tuning tasks, we propose a generic learnable model-driven framework in this paper. To explain how the proposed framework works, we consider regularized zero-forcing precoding as a usage instance and design a light-weight neural network for refined prediction of sum rate and detection error based on coarse model-driven approximations. Then, we estimate the CSI uncertainty on the learned predictor in an iterative manner and, on this basis, optimize the transmit regularization term and subsequent receive power scaling factors. A deep unfolded projected gradient descent based algorithm is proposed for power scaling, which achieves favorable trade-off between convergence rate and robustness.
Laplacian Autoencoders for Learning Stochastic Representations
Authors: Authors: Marco Miani, Frederik Warburg, Pablo Moreno-Muñoz, Nicke Skafte Detlefsen, Søren Hauberg
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2206.15078
Pdf link: https://arxiv.org/pdf/2206.15078
Abstract Representation learning has become a practical family of methods for building rich parametric codifications of massive high-dimensional data while succeeding in the reconstruction side. When considering unsupervised tasks with test-train distribution shifts, the probabilistic viewpoint helps for addressing overconfidence and poor calibration of predictions. However, the direct introduction of Bayesian inference on top of neural networks weights is still an ardous problem for multiple reasons, i.e. the curse of dimensionality or intractability issues. The Laplace approximation (LA) offers a solution here, as one may build Gaussian approximations of the posterior density of weights via second-order Taylor expansions in certain locations of the parameter space. In this work, we present a Bayesian autoencoder for unsupervised representation learning inspired in LA. Our method implements iterative Laplace updates to obtain a novel variational lower-bound of the autoencoder evidence. The vast computational burden of the second-order partial derivatives is skipped via approximations of the Hessian matrix. Empirically, we demonstrate the scalability and performance of the Laplacian autoencoder by providing well-calibrated uncertainties for out-of-distribution detection, geodesics for differential geometry and missing data imputations.
MKIoU Loss: Towards Accurate Oriented Object Detection in Aerial Images
Authors: Authors: Xinyi Yu, Jiangping Lu, Xinyi Yu, Mi Lin, Linlin Ou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2206.15109
Pdf link: https://arxiv.org/pdf/2206.15109
Abstract Oriented bounding box regression is crucial for oriented object detection. However, regression-based methods often suffer from boundary problems and the inconsistency between loss and evaluation metrics. In this paper, a modulated Kalman IoU loss of approximate SkewIoU is proposed, named MKIoU. To avoid boundary problems, we convert the oriented bounding box to Gaussian distribution, then use the Kalman filter to approximate the intersection area. However, there exists significant difference between the calculated and actual intersection areas. Thus, we propose a modulation factor to adjust the sensitivity of angle deviation and width-height offset to loss variation, making the loss more consistent with the evaluation metric. Furthermore, the Gaussian modeling method avoids the boundary problem but causes the angle confusion of square objects simultaneously. Thus, the Gaussian Angle Loss (GA Loss) is presented to solve this problem by adding a corrected loss for square targets. The proposed GA Loss can be easily extended to other Gaussian-based methods. Experiments on three publicly available aerial image datasets, DOTA, UCAS-AOD, and HRSC2016, show the effectiveness of the proposed method.
Detecting and Recovering Adversarial Examples from Extracting Non-robust and Highly Predictive Adversarial Perturbations
Authors: Authors: Mingyu Dong, Jiahao Chen, Diqun Yan, Jingxing Gao, Li Dong, Rangding Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2206.15128
Pdf link: https://arxiv.org/pdf/2206.15128
Abstract Deep neural networks (DNNs) have been shown to be vulnerable against adversarial examples (AEs) which are maliciously designed to fool target models. The normal examples (NEs) added with imperceptible adversarial perturbation, can be a security threat to DNNs. Although the existing AEs detection methods have achieved a high accuracy, they failed to exploit the information of the AEs detected. Thus, based on high-dimension perturbation extraction, we propose a model-free AEs detection method, the whole process of which is free from querying the victim model. Research shows that DNNs are sensitive to the high-dimension features. The adversarial perturbation hiding in the adversarial example belongs to the high-dimension feature which is highly predictive and non-robust. DNNs learn more details from high-dimension data than others. In our method, the perturbation extractor can extract the adversarial perturbation from AEs as high-dimension feature, then the trained AEs discriminator determines whether the input is an AE. Experimental results show that the proposed method can not only detect the adversarial examples with high accuracy, but also detect the specific category of the AEs. Meanwhile, the extracted perturbation can be used to recover the AEs to NEs.
Personalized Detection of Cognitive Biases in Actions of Users from Their Logs: Anchoring and Recency Biases
Authors: Authors: Atanu R Sinha, Navita Goyal, Sunny Dhamnani, Tanay Asija, Raja K Dubey, M V Kaarthik Raja, Georgios Theocharous
Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2206.15129
Pdf link: https://arxiv.org/pdf/2206.15129
Abstract Cognitive biases are mental shortcuts humans use in dealing with information and the environment, and which result in biased actions and behaviors (or, actions), unbeknownst to themselves. Biases take many forms, with cognitive biases occupying a central role that inflicts fairness, accountability, transparency, ethics, law, medicine, and discrimination. Detection of biases is considered a necessary step toward their mitigation. Herein, we focus on two cognitive biases - anchoring and recency. The recognition of cognitive bias in computer science is largely in the domain of information retrieval, and bias is identified at an aggregate level with the help of annotated data. Proposing a different direction for bias detection, we offer a principled approach along with Machine Learning to detect these two cognitive biases from Web logs of users' actions. Our individual user level detection makes it truly personalized, and does not rely on annotated data. Instead, we start with two basic principles established in cognitive psychology, use modified training of an attention network, and interpret attention weights in a novel way according to those principles, to infer and distinguish between these two biases. The personalized approach allows detection for specific users who are susceptible to these biases when performing their tasks, and can help build awareness among them so as to undertake bias mitigation.
DFGC 2022: The Second DeepFake Game Competition
Authors: Authors: Bo Peng, Wei Xiang, Yue Jiang, Wei Wang, Jing Dong, Zhenan Sun, Zhen Lei, Siwei Lyu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2206.15138
Pdf link: https://arxiv.org/pdf/2206.15138
Abstract This paper presents the summary report on our DFGC 2022 competition. The DeepFake is rapidly evolving, and realistic face-swaps are becoming more deceptive and difficult to detect. On the contrary, methods for detecting DeepFakes are also improving. There is a two-party game between DeepFake creators and defenders. This competition provides a common platform for benchmarking the game between the current state-of-the-arts in DeepFake creation and detection methods. The main research question to be answered by this competition is the current state of the two adversaries when competed with each other. This is the second edition after the last year's DFGC 2021, with a new, more diverse video dataset, a more realistic game setting, and more reasonable evaluation metrics. With this competition, we aim to stimulate research ideas for building better defenses against the DeepFake threats. We also release our DFGC 2022 dataset contributed by both our participants and ourselves to enrich the DeepFake data resources for the research community (https://github.com/NiCE-X/DFGC-2022).
HRFuser: A Multi-resolution Sensor Fusion Architecture for 2D Object Detection
Authors: Authors: Tim Broedermann (1), Christos Sakaridis (1), Dengxin Dai (2), Luc Van Gool (1 and 3) ((1) ETH Zurich, (2) MPI for Informatics, (3) KU Leuven)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2206.15157
Pdf link: https://arxiv.org/pdf/2206.15157
Abstract Besides standard cameras, autonomous vehicles typically include multiple additional sensors, such as lidars and radars, which help acquire richer information for perceiving the content of the driving scene. While several recent works focus on fusing certain pairs of sensors - such as camera and lidar or camera and radar - by using architectural components specific to the examined setting, a generic and modular sensor fusion architecture is missing from the literature. In this work, we focus on 2D object detection, a fundamental high-level task which is defined on the 2D image domain, and propose HRFuser, a multi-resolution sensor fusion architecture that scales straightforwardly to an arbitrary number of input modalities. The design of HRFuser is based on state-of-the-art high-resolution networks for image-only dense prediction and incorporates a novel multi-window cross-attention block as the means to perform fusion of multiple modalities at multiple resolutions. Even though cameras alone provide very informative features for 2D detection, we demonstrate via extensive experiments on the nuScenes and Seeing Through Fog datasets that our model effectively leverages complementary features from additional modalities, substantially improving upon camera-only performance and consistently outperforming state-of-the-art fusion methods for 2D detection both in normal and adverse conditions. The source code will be made publicly available.
Using Person Embedding to Enrich Features and Data Augmentation for Classification
Authors: Authors: Ahmet Tuğrul Bayrak
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2206.15162
Pdf link: https://arxiv.org/pdf/2206.15162
Abstract Today, machine learning is applied in almost any field. In machine learning, where there are numerous methods, classification is one of the most basic and crucial ones. Various problems can be solved by classification. The feature selection for model setup is extremely important, and producing new features via feature engineering also has a vital place in the success of the model. In our study, fraud detection classification models are built on a labeled and imbalanced dataset as a case-study. Although it is a natural language processing method, a customer space has been created with word embedding, which has been used in different areas, especially for recommender systems. The customer vectors in the created space are fed to the classification model as a feature. Moreover, to increase the number of positive labels, rows with similar characteristics are re-labeled as positive by using customer similarity determined by embedding. The model in which embedding methods are included in the classification, which provides a better representation of customers, has been compared with other models. Considering the results, it is observed that the customer embedding method had a positive effect on the success of the classification models.
Graph-Time Convolutional Neural Networks: Architecture and Theoretical Analysis
Authors: Authors: Mohammad Sabbaqi, Elvin Isufi
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2206.15174
Pdf link: https://arxiv.org/pdf/2206.15174
Abstract Devising and analyzing learning models for spatiotemporal network data is of importance for tasks including forecasting, anomaly detection, and multi-agent coordination, among others. Graph Convolutional Neural Networks (GCNNs) are an established approach to learn from time-invariant network data. The graph convolution operation offers a principled approach to aggregate multiresolution information. However, extending the convolution principled learning and respective analysis to the spatiotemporal domain is challenging because spatiotemporal data have more intrinsic dependencies. Hence, a higher flexibility to capture jointly the spatial and the temporal dependencies is required to learn meaningful higher-order representations. Here, we leverage product graphs to represent the spatiotemporal dependencies in the data and introduce Graph-Time Convolutional Neural Networks (GTCNNs) as a principled architecture to aid learning. The proposed approach can work with any type of product graph and we also introduce a parametric product graph to learn also the spatiotemporal coupling. The convolution principle further allows a similar mathematical tractability as for GCNNs. In particular, the stability result shows GTCNNs are stable to spatial perturbations but there is an implicit trade-off between discriminability and robustness; i.e., the more complex the model, the less stable. Extensive numerical results on benchmark datasets corroborate our findings and show the GTCNN compares favorably with state-of-the-art solutions. We anticipate the GTCNN to be a starting point for more sophisticated models that achieve good performance but are also fundamentally grounded.
Out-of-Distribution Detection for Long-tailed and Fine-grained Skin Lesion Images
Authors: Authors: Deval Mehta, Yaniv Gal, Adrian Bowling, Paul Bonnington, Zongyuan Ge
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2206.15186
Pdf link: https://arxiv.org/pdf/2206.15186
Abstract Recent years have witnessed a rapid development of automated methods for skin lesion diagnosis and classification. Due to an increasing deployment of such systems in clinics, it has become important to develop a more robust system towards various Out-of-Distribution(OOD) samples (unknown skin lesions and conditions). However, the current deep learning models trained for skin lesion classification tend to classify these OOD samples incorrectly into one of their learned skin lesion categories. To address this issue, we propose a simple yet strategic approach that improves the OOD detection performance while maintaining the multi-class classification accuracy for the known categories of skin lesion. To specify, this approach is built upon a realistic scenario of a long-tailed and fine-grained OOD detection task for skin lesion images. Through this approach, 1) First, we target the mixup amongst middle and tail classes to address the long-tail problem. 2) Later, we combine the above mixup strategy with prototype learning to address the fine-grained nature of the dataset. The unique contribution of this paper is two-fold, justified by extensive experiments. First, we present a realistic problem setting of OOD task for skin lesion. Second, we propose an approach to target the long-tailed and fine-grained aspects of the problem setting simultaneously to increase the OOD performance.
libACA, pyACA, and ACA-Code: Audio Content Analysis in 3 Languages
Authors: Authors: Alexander Lerch
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2206.15219
Pdf link: https://arxiv.org/pdf/2206.15219
Abstract The three packages libACA, pyACA, and ACA-Code provide reference implementations for basic approaches and algorithms for the analysis of musical audio signals in three different languages: C++, Python, and Matlab. All three packages cover the same algorithms, such as extraction of low level audio features, fundamental frequency estimation, as well as simple approaches to chord recognition, musical key detection, and onset detection. In addition, it implementations of more generic algorithms useful in audio content analysis such as dynamic time warping and the Viterbi algorithm are provided. The three packages thus provide a practical cross-language and cross-platform reference to students and engineers implementing audio analysis algorithms and enable implementation-focused learning of algorithms for audio content analysis and music information retrieval.
Submission to Generic Event Boundary Detection Challenge@CVPR 2022: Local Context Modeling and Global Boundary Decoding Approach
Authors: Authors: Jiaqi Tang, Zhaoyang Liu, Jing Tan, Chen Qian, Wayne Wu, Limin Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2206.15268
Pdf link: https://arxiv.org/pdf/2206.15268
Abstract Generic event boundary detection (GEBD) is an important yet challenging task in video understanding, which aims at detecting the moments where humans naturally perceive event boundaries. In this paper, we present a local context modeling and global boundary decoding approach for GEBD task. Local context modeling sub-network is proposed to perceive diverse patterns of generic event boundaries, and it generates powerful video representations and reliable boundary confidence. Based on them, global boundary decoding sub-network is exploited to decode event boundaries from a global view. Our proposed method achieves 85.13% F1-score on Kinetics-GEBD testing set, which achieves a more than 22% F1-score boost compared to the baseline method. The code is available at https://github.com/JackyTown/GEBD_Challenge_CVPR2022.
Interpretable Anomaly Detection in Echocardiograms with Dynamic Variational Trajectory Models
Authors: Authors: Alain Ryser, Laura Manduchi, Fabian Laumer, Holger Michel, Sven Wellmann, Julia E. Vogt
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Computation (stat.CO); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2206.15316
Pdf link: https://arxiv.org/pdf/2206.15316
Abstract We propose a novel anomaly detection method for echocardiogram videos. The introduced method takes advantage of the periodic nature of the heart cycle to learn different variants of a variational latent trajectory model (TVAE). The models are trained on the healthy samples of an in-house dataset of infant echocardiogram videos consisting of multiple chamber views to learn a normative prior of the healthy population. During inference, maximum a posteriori (MAP) based anomaly detection is performed to detect out-of-distribution samples in our dataset. The proposed method reliably identifies severe congenital heart defects, such as Ebstein's Anomaly or Shonecomplex. Moreover, it achieves superior performance over MAP-based anomaly detection with standard variational autoencoders on the task of detecting pulmonary hypertension and right ventricular dilation. Finally, we demonstrate that the proposed method provides interpretable explanations of its output through heatmaps which highlight the regions corresponding to anomalous heart structures.
Byzantine Agreement with Optimal Resilience via Statistical Fraud Detection
Authors: Authors: Shang-En Huang, Seth Pettie, Leqi Zhu
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Statistics Theory (math.ST)
Arxiv link: https://arxiv.org/abs/2206.15335
Pdf link: https://arxiv.org/pdf/2206.15335
Abstract Since the mid-1980s it has been known that Byzantine Agreement can be solved with probability 1 asynchronously, even against an omniscient, computationally unbounded adversary that can adaptively \emph{corrupt} up to $f<n/3$ parties. Moreover, the problem is insoluble with $f\geq n/3$ corruptions. However, Bracha's 1984 protocol achieved $f<n/3$ resilience at the cost of exponential expected latency $2^{\Theta(n)}$, a bound that has never been improved in this model with $f=\lfloor (n-1)/3 \rfloor$ corruptions. In this paper we prove that Byzantine Agreement in the asynchronous, full information model can be solved with probability 1 against an adaptive adversary that can corrupt $f<n/3$ parties, while incurring only polynomial latency with high probability. Our protocol follows earlier polynomial latency protocols of King and Saia and Huang, Pettie, and Zhu, which had suboptimal resilience, namely $f \approx n/10^9$ and $f<n/4$, respectively. Resilience $f=(n-1)/3$ is uniquely difficult as this is the point at which the influence of the Byzantine and honest players are of roughly equal strength. The core technical problem we solve is to design a collective coin-flipping protocol that eventually lets us flip a coin with an unambiguous outcome. In the beginning the influence of the Byzantine players is too powerful to overcome and they can essentially fix the coin's behavior at will. We guarantee that after just a polynomial number of executions of the coin-flipping protocol, either (a) the Byzantine players fail to fix the behavior of the coin (thereby ending the game) or (b) we can ``blacklist'' players such that the blacklisting rate for Byzantine players is at least as large as the blacklisting rate for good players. The blacklisting criterion is based on a simple statistical test of fraud detection.
Learning Citywide Patterns of Life from Trajectory Monitoring
Authors: Authors: Mark Tenzer, Zeeshan Rasheed, Khurram Shafique
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2206.15352
Pdf link: https://arxiv.org/pdf/2206.15352
Abstract The recent proliferation of real-world human mobility datasets has catalyzed geospatial and transportation research in trajectory prediction, demand forecasting, travel time estimation, and anomaly detection. However, these datasets also enable, more broadly, a descriptive analysis of intricate systems of human mobility. We formally define patterns of life analysis as a natural, explainable extension of online unsupervised anomaly detection, where we not only monitor a data stream for anomalies but also explicitly extract normal patterns over time. To learn patterns of life, we adapt Grow When Required (GWR) episodic memory from research in computational biology and neurorobotics to a new domain of geospatial analysis. This biologically-inspired neural network, related to self-organizing maps (SOM), constructs a set of "memories" or prototype traffic patterns incrementally as it iterates over the GPS stream. It then compares each new observation to its prior experiences, inducing an online, unsupervised clustering and anomaly detection on the data. We mine patterns-of-interest from the Porto taxi dataset, including both major public holidays and newly-discovered transportation anomalies, such as festivals and concerts which, to our knowledge, have not been previously acknowledged or reported in prior work. We anticipate that the capability to incrementally learn normal and abnormal road transportation behavior will be useful in many domains, including smart cities, autonomous vehicles, and urban planning and management.
Two-Stage Classifier for COVID-19 Misinformation Detection Using BERT: a Study on Indonesian Tweets
Authors: Authors: Douglas Raevan Faisal, Rahmad Mahendra
Subjects: Computation and Language (cs.CL); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2206.15359
Pdf link: https://arxiv.org/pdf/2206.15359
Abstract The COVID-19 pandemic has caused globally significant impacts since the beginning of 2020. This brought a lot of confusion to society, especially due to the spread of misinformation through social media. Although there were already several studies related to the detection of misinformation in social media data, most studies focused on the English dataset. Research on COVID-19 misinformation detection in Indonesia is still scarce. Therefore, through this research, we collect and annotate datasets for Indonesian and build prediction models for detecting COVID-19 misinformation by considering the tweet's relevance. The dataset construction is carried out by a team of annotators who labeled the relevance and misinformation of the tweet data. In this study, we propose the two-stage classifier model using IndoBERT pre-trained language model for the Tweet misinformation detection task. We also experiment with several other baseline models for text classification. The experimental results show that the combination of the BERT sequence classifier for relevance prediction and Bi-LSTM for misinformation detection outperformed other machine learning models with an accuracy of 87.02%. Overall, the BERT utilization contributes to the higher performance of most prediction models. We release a high-quality COVID-19 misinformation Tweet corpus in the Indonesian language, indicated by the high inter-annotator agreement.
PolarFormer: Multi-camera 3D Object Detection with Polar Transformer
Authors: Authors: Yanqin Jiang, Li Zhang, Zhenwei Miao, Xiatian Zhu, Jin Gao, Weiming Hu, Yu-Gang Jiang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2206.15398
Pdf link: https://arxiv.org/pdf/2206.15398
Abstract 3D object detection in autonomous driving aims to reason "what" and "where" the objects of interest present in a 3D world. Following the conventional wisdom of previous 2D object detection, existing methods often adopt the canonical Cartesian coordinate system with perpendicular axis. However, we conjugate that this does not fit the nature of the ego car's perspective, as each onboard camera perceives the world in shape of wedge intrinsic to the imaging geometry with radical (non-perpendicular) axis. Hence, in this paper we advocate the exploitation of the Polar coordinate system and propose a new Polar Transformer (PolarFormer) for more accurate 3D object detection in the bird's-eye-view (BEV) taking as input only multi-camera 2D images. Specifically, we design a cross attention based Polar detection head without restriction to the shape of input structure to deal with irregular Polar grids. For tackling the unconstrained object scale variations along Polar's distance dimension, we further introduce a multi-scalePolar representation learning strategy. As a result, our model can make best use of the Polar representation rasterized via attending to the corresponding image observation in a sequence-to-sequence fashion subject to the geometric constraints. Thorough experiments on the nuScenes dataset demonstrate that our PolarFormer outperforms significantly state-of-the-art 3D object detection alternatives, as well as yielding competitive performance on BEV semantic segmentation task.
MEAD: A Multi-Armed Approach for Evaluation of Adversarial Examples Detectors
Authors: Authors: Federica Granese, Marine Picot, Marco Romanelli, Francisco Messina, Pablo Piantanida
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2206.15415
Pdf link: https://arxiv.org/pdf/2206.15415
Abstract Detection of adversarial examples has been a hot topic in the last years due to its importance for safely deploying machine learning algorithms in critical applications. However, the detection methods are generally validated by assuming a single implicitly known attack strategy, which does not necessarily account for real-life threats. Indeed, this can lead to an overoptimistic assessment of the detectors' performance and may induce some bias in the comparison between competing detection schemes. We propose a novel multi-armed framework, called MEAD, for evaluating detectors based on several attack strategies to overcome this limitation. Among them, we make use of three new objectives to generate attacks. The proposed performance metric is based on the worst-case scenario: detection is successful if and only if all different attacks are correctly recognized. Empirically, we show the effectiveness of our approach. Moreover, the poor performance obtained for state-of-the-art detectors opens a new exciting line of research.
Distributed asynchronous convergence detection without detection protocol
Authors: Authors: Guillaume Gbikpi-Benissan, Frederic Magoules
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2206.15418
Pdf link: https://arxiv.org/pdf/2206.15418
Abstract In this paper, we address the problem of detecting the moment when an ongoing asynchronous parallel iterative process can be terminated to provide a sufficiently precise solution to a fixed-point problem being solved. Formulating the detection problem as a global solution identification problem, we analyze the snapshot-based approach, which is the only one that allows for exact global residual error computation. From a recently developed approximate snapshot protocol providing a reliable global residual error, we experimentally investigate here, as well, the reliability of a global residual error computed without any prior particular detection mechanism. Results on a single-site supercomputer successfully show that such high-performance computing platforms possibly provide computational environments stable enough to allow for simply resorting to non-blocking reduction operations for computing reliable global residual errors, which provides noticeable time saving, at both implementation and execution levels.
JACK2: a new high-level communication library for parallel iterative methods
Authors: Authors: Guillaume Gbikpi-Benissan, Frederic Magoules
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2206.15420
Pdf link: https://arxiv.org/pdf/2206.15420
Abstract In this paper, we address the problem of designing a distributed application meant to run both classical and asynchronous iterations. MPI libraries are very popular and widely used in the scientific community, however asynchronous iterative methods raise non-negligible difficulties about the efficient management of communication requests and buffers. Moreover, a convergence detection issue is introduced, which requires the implementation of one of the various state-of-the-art termination methods, which are not necessarily highly reliable for most computational environments. We propose here an MPI-based communication library which handles all these issues in a non-intrusive manner, providing a unique interface for implementing both classical and asynchronous iterations. Few details are highlighted about our approach to achieve best communication rates and ensure accurate convergence detection. Experimental results on two supercomputers confirmed the low overhead communication costs introduced, and the effectiveness of our library.
AnoShift: A Distribution Shift Benchmark for Unsupervised Anomaly Detection
Authors: Authors: Marius Drăgoi, Elena Burceanu, Emanuela Haller, Andrei Manolache, Florin Brad
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2206.15476
Pdf link: https://arxiv.org/pdf/2206.15476
Abstract Analyzing the distribution shift of data is a growing research direction in nowadays Machine Learning, leading to emerging new benchmarks that focus on providing a suitable scenario for studying the generalization properties of ML models. The existing benchmarks are focused on supervised learning, and to the best of our knowledge, there is none for unsupervised learning. Therefore, we introduce an unsupervised anomaly detection benchmark with data that shifts over time, built over Kyoto-2006+, a traffic dataset for network intrusion detection. This kind of data meets the premise of shifting the input distribution: it covers a large time span ($10$ years), with naturally occurring changes over time (\eg users modifying their behavior patterns, and software updates). We first highlight the non-stationary nature of the data, using a basic per-feature analysis, t-SNE, and an Optimal Transport approach for measuring the overall distribution distances between years. Next, we propose AnoShift, a protocol splitting the data in IID, NEAR, and FAR testing splits. We validate the performance degradation over time with diverse models (MLM to classical Isolation Forest). Finally, we show that by acknowledging the distribution shift problem and properly addressing it, the performance can be improved compared to the classical IID training (by up to $3\%$, on average). Dataset and code are available at https://github.com/bit-ml/AnoShift/.
Keyword: face recognition

There is no result

Keyword: augmentation

Teach me how to Interpolate a Myriad of Embeddings
Authors: Authors: Shashanka Venkataramanan, Ewa Kijak, Laurent Amsaleg, Yannis Avrithis
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2206.14868
Pdf link: https://arxiv.org/pdf/2206.14868
Abstract Mixup refers to interpolation-based data augmentation, originally motivated as a way to go beyond empirical risk minimization (ERM). Yet, its extensions focus on the definition of interpolation and the space where it takes place, while the augmentation itself is less studied: For a mini-batch of size $m$, most methods interpolate between $m$ pairs with a single scalar interpolation factor $\lambda$. In this work, we make progress in this direction by introducing MultiMix, which interpolates an arbitrary number $n$ of tuples, each of length $m$, with one vector $\lambda$ per tuple. On sequence data, we further extend to dense interpolation and loss computation over all spatial positions. Overall, we increase the number of tuples per mini-batch by orders of magnitude at little additional cost. This is possible by interpolating at the very last layer before the classifier. Finally, to address inconsistencies due to linear target interpolation, we introduce a self-distillation approach to generate and interpolate synthetic targets. We empirically show that our contributions result in significant improvement over state-of-the-art mixup methods on four benchmarks. By analyzing the embedding space, we observe that the classes are more tightly clustered and uniformly spread over the embedding space, thereby explaining the improved behavior.
Exploring Temporally Dynamic Data Augmentation for Video Recognition
Authors: Authors: Taeoh Kim, Jinhyung Kim, Minho Shim, Sangdoo Yun, Myunggu Kang, Dongyoon Wee, Sangyoun Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2206.15015
Pdf link: https://arxiv.org/pdf/2206.15015
Abstract Data augmentation has recently emerged as an essential component of modern training recipes for visual recognition tasks. However, data augmentation for video recognition has been rarely explored despite its effectiveness. Few existing augmentation recipes for video recognition naively extend the image augmentation methods by applying the same operations to the whole video frames. Our main idea is that the magnitude of augmentation operations for each frame needs to be changed over time to capture the real-world video's temporal variations. These variations should be generated as diverse as possible using fewer additional hyper-parameters during training. Through this motivation, we propose a simple yet effective video data augmentation framework, DynaAugment. The magnitude of augmentation operations on each frame is changed by an effective mechanism, Fourier Sampling that parameterizes diverse, smooth, and realistic temporal variations. DynaAugment also includes an extended search space suitable for video for automatic data augmentation methods. DynaAugment experimentally demonstrates that there are additional performance rooms to be improved from static augmentations on diverse video models. Specifically, we show the effectiveness of DynaAugment on various video datasets and tasks: large-scale video recognition (Kinetics-400 and Something-Something-v2), small-scale video recognition (UCF- 101 and HMDB-51), fine-grained video recognition (Diving-48 and FineGym), video action segmentation on Breakfast, video action localization on THUMOS'14, and video object detection on MOT17Det. DynaAugment also enables video models to learn more generalized representation to improve the model robustness on the corrupted videos.
TINC: Temporally Informed Non-Contrastive Learning for Disease Progression Modeling in Retinal OCT Volumes
Authors: Authors: Taha Emre, Arunava Chakravarty, Antoine Rivail, Sophie Riedl, Ursula Schmidt-Erfurth, Hrvoje Bogunović
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2206.15282
Pdf link: https://arxiv.org/pdf/2206.15282
Abstract Recent contrastive learning methods achieved state-of-the-art in low label regimes. However, the training requires large batch sizes and heavy augmentations to create multiple views of an image. With non-contrastive methods, the negatives are implicitly incorporated in the loss, allowing different images and modalities as pairs. Although the meta-information (i.e., age, sex) in medical imaging is abundant, the annotations are noisy and prone to class imbalance. In this work, we exploited already existing temporal information (different visits from a patient) in a longitudinal optical coherence tomography (OCT) dataset using temporally informed non-contrastive loss (TINC) without increasing complexity and need for negative pairs. Moreover, our novel pair-forming scheme can avoid heavy augmentations and implicitly incorporates the temporal information in the pairs. Finally, these representations learned from the pretraining are more successful in predicting disease progression where the temporal information is crucial for the downstream task. More specifically, our model outperforms existing models in predicting the risk of conversion within a time frame from intermediate age-related macular degeneration (AMD) to the late wet-AMD stage.
Improving the Generalization of Supervised Models
Authors: Authors: Mert Bulent Sariyildiz, Yannis Kalantidis, Karteek Alahari, Diane Larlus
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2206.15369
Pdf link: https://arxiv.org/pdf/2206.15369
Abstract We consider the problem of training a deep neural network on a given classification task, e.g., ImageNet-1K (IN1K), so that it excels at that task as well as at other (future) transfer tasks. These two seemingly contradictory properties impose a trade-off between improving the model's generalization while maintaining its performance on the original task. Models trained with self-supervised learning (SSL) tend to generalize better than their supervised counterparts for transfer learning; yet, they still lag behind supervised models on IN1K. In this paper, we propose a supervised learning setup that leverages the best of both worlds. We enrich the common supervised training framework using two key components of recent SSL models: multi-scale crops for data augmentation and the use of an expendable projector head. We replace the last layer of class weights with class prototypes computed on the fly using a memory bank. We show that these three improvements lead to a more favorable trade-off between the IN1K training task and 13 transfer tasks. Over all the explored configurations, we single out two models: t-ReX that achieves a new state of the art for transfer learning and outperforms top methods such as DINO and PAWS on IN1K, and t-ReX* that matches the highly optimized RSB-A1 model on IN1K while performing better on transfer tasks. Project page and pretrained models: https://europe.naverlabs.com/t-rex

LeeKyungwook / get-arxiv-noti

New submissions for Fri, 1 Jul 22 #291

Keyword: detection

Neural Motion Fields: Encoding Grasp Trajectories as Implicit Value Functions

Machine Learning Approaches to Predict Breast Cancer: Bangladesh Perspective

Semi-Supervised Generative Adversarial Network for Stress Detection Using Partially Labeled Physiological Data

Cross-domain Federated Object Detection

Exploring Temporally Dynamic Data Augmentation for Video Recognition

Causality-Based Multivariate Time Series Anomaly Detection

ZeroC: A Neuro-Symbolic Model for Zero-shot Concept Recognition and Acquisition at Inference Time

Learning-Based Near-Orthogonal Superposition Code for MIMO Short Message Transmission

Learnable Model-Driven Performance Prediction and Optimization for Imperfect MIMO System: Framework and Application

Laplacian Autoencoders for Learning Stochastic Representations

MKIoU Loss: Towards Accurate Oriented Object Detection in Aerial Images

Detecting and Recovering Adversarial Examples from Extracting Non-robust and Highly Predictive Adversarial Perturbations

Personalized Detection of Cognitive Biases in Actions of Users from Their Logs: Anchoring and Recency Biases

DFGC 2022: The Second DeepFake Game Competition

HRFuser: A Multi-resolution Sensor Fusion Architecture for 2D Object Detection

Using Person Embedding to Enrich Features and Data Augmentation for Classification

Graph-Time Convolutional Neural Networks: Architecture and Theoretical Analysis

Out-of-Distribution Detection for Long-tailed and Fine-grained Skin Lesion Images

libACA, pyACA, and ACA-Code: Audio Content Analysis in 3 Languages

Submission to Generic Event Boundary Detection Challenge@CVPR 2022: Local Context Modeling and Global Boundary Decoding Approach

Interpretable Anomaly Detection in Echocardiograms with Dynamic Variational Trajectory Models

Byzantine Agreement with Optimal Resilience via Statistical Fraud Detection

Learning Citywide Patterns of Life from Trajectory Monitoring

Two-Stage Classifier for COVID-19 Misinformation Detection Using BERT: a Study on Indonesian Tweets

PolarFormer: Multi-camera 3D Object Detection with Polar Transformer

MEAD: A Multi-Armed Approach for Evaluation of Adversarial Examples Detectors

Distributed asynchronous convergence detection without detection protocol

JACK2: a new high-level communication library for parallel iterative methods

AnoShift: A Distribution Shift Benchmark for Unsupervised Anomaly Detection

Keyword: face recognition

Keyword: augmentation

Teach me how to Interpolate a Myriad of Embeddings

Exploring Temporally Dynamic Data Augmentation for Video Recognition

TINC: Temporally Informed Non-Contrastive Learning for Disease Progression Modeling in Retinal OCT Volumes

Improving the Generalization of Supervised Models