Abstract
Monitoring the status of large computing systems is essential to identify unexpected behavior and improve their performance and uptime. However, due to the large-scale and distributed design of such computing systems as well as a large number of monitoring parameters, automated monitoring methods should be applied. Such automatic monitoring methods should also have the ability to adapt themselves to the continuous changes in the computing system. In addition, they should be able to identify behavioral anomalies in useful time, to perform appropriate reactions. This work proposes a general lightweight and unsupervised method for near real-time anomaly detection using operational data measurement on large computing systems. The proposed model requires as little as 4 hours of data and 50 epochs for each training process to accurately resemble the behavioral pattern of computing systems.
Illuminate: A novel approach for depression detection with explainable analysis and proactive therapy using prompt engineering
Authors: Authors: Aryan Agrawal
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
This paper introduces a novel paradigm for depression detection and treatment using advanced Large Language Models (LLMs): Generative Pre-trained Transformer 4 (GPT-4), Llama 2 chat, and Gemini. These LLMs are fine-tuned with specialized prompts to diagnose, explain, and suggest therapeutic interventions for depression. A unique few-shot prompting method enhances the models' ability to analyze and explain depressive symptoms based on the DSM-5 criteria. In the interaction phase, the models engage in empathetic dialogue management, drawing from resources like PsychDB and a Cognitive Behavioral Therapy (CBT) Guide, fostering supportive interactions with individuals experiencing major depressive disorders. Additionally, the research introduces the Illuminate Database, enriched with various CBT modules, aiding in personalized therapy recommendations. The study evaluates LLM performance using metrics such as F1 scores, Precision, Recall, Cosine similarity, and Recall-Oriented Understudy for Gisting Evaluation (ROUGE) across different test sets, demonstrating their effectiveness. This comprehensive approach blends cutting-edge AI with established psychological methods, offering new possibilities in mental health care and showcasing the potential of LLMs in revolutionizing depression diagnosis and treatment strategies.
Enhancement of Bengali OCR by Specialized Models and Advanced Techniques for Diverse Document Types
Abstract
This research paper presents a unique Bengali OCR system with some capabilities. The system excels in reconstructing document layouts while preserving structure, alignment, and images. It incorporates advanced image and signature detection for accurate extraction. Specialized models for word segmentation cater to diverse document types, including computer-composed, letterpress, typewriter, and handwritten documents. The system handles static and dynamic handwritten inputs, recognizing various writing styles. Furthermore, it has the ability to recognize compound characters in Bengali. Extensive data collection efforts provide a diverse corpus, while advanced technical components optimize character and word recognition. Additional contributions include image, logo, signature and table recognition, perspective correction, layout reconstruction, and a queuing module for efficient and scalable processing. The system demonstrates outstanding performance in efficient and accurate text extraction and analysis.
Scrapping The Web For Early Wildfire Detection
Authors: Authors: Mateo Lostanlen, Felix Veith, Cristian Buc, Valentin Barriere
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Early wildfire detection is of the utmost importance to enable rapid response efforts, and thus minimize the negative impacts of wildfire spreads. To this end, we present \Pyro, a web-scraping-based dataset composed of videos of wildfires from a network of cameras that were enhanced with manual bounding-box-level annotations. Our dataset was filtered based on a strategy to improve the quality and diversity of the data, reducing the final data to a set of 10,000 images. We ran experiments using a state-of-the-art object detection model and found out that the proposed dataset is challenging and its use in concordance with other public dataset helps to reach higher results overall. We will make our code and data publicly available.
Guiding Large Language Models with Divide-and-Conquer Program for Discerning Problem Solving
Authors: Authors: Yizhou Zhang, Lun Du, Defu Cao, Qiang Fu, Yan Liu
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
Foundation models, such as Large language Models (LLMs), have attracted significant amount of interest due to their large number of applications. Existing works show that appropriate prompt design, such as Chain-of-Thoughts, can unlock LLM's powerful capacity in diverse areas. However, when handling tasks involving repetitive sub-tasks and/or deceptive contents, such as arithmetic calculation and article-level fake news detection, existing prompting strategies either suffers from insufficient expressive power or intermediate errors triggered by hallucination. To make LLM more discerning to such intermediate errors, we propose to guide LLM with a Divide-and-Conquer program that simultaneously ensures superior expressive power and disentangles task decomposition, sub-task resolution, and resolution assembly process. Theoretic analysis reveals that our strategy can guide LLM to extend the expressive power of fixed-depth Transformer. Experiments indicate that our proposed method can achieve better performance than typical prompting strategies in tasks bothered by intermediate errors and deceptive contents, such as large integer multiplication, hallucination detection and misinformation detection.
Task-customized Masked AutoEncoder via Mixture of Cluster-conditional Experts
Authors: Authors: Zhili Liu, Kai Chen, Jianhua Han, Lanqing Hong, Hang Xu, Zhenguo Li, James T. Kwok
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Masked Autoencoder~(MAE) is a prevailing self-supervised learning method that achieves promising results in model pre-training. However, when the various downstream tasks have data distributions different from the pre-training data, the semantically irrelevant pre-training information might result in negative transfer, impeding MAE's scalability. To address this issue, we propose a novel MAE-based pre-training paradigm, Mixture of Cluster-conditional Experts (MoCE), which can be trained once but provides customized pre-training models for diverse downstream tasks. Different from the mixture of experts (MoE), our MoCE trains each expert only with semantically relevant images by using cluster-conditional gates. Thus, each downstream task can be allocated to its customized model pre-trained with data most similar to the downstream data. Experiments on a collection of 11 downstream tasks show that MoCE outperforms the vanilla MAE by 2.45\% on average. It also obtains new state-of-the-art self-supervised learning results on detection and segmentation.
TASER: Temporal Adaptive Sampling for Fast and Accurate Dynamic Graph Representation Learning
Authors: Authors: Gangda Deng, Hongkuan Zhou, Hanqing Zeng, Yinglong Xia, Christopher Leung, Jianbo Li, Rajgopal Kannan, Viktor Prasanna
Abstract
Recently, Temporal Graph Neural Networks (TGNNs) have demonstrated state-of-the-art performance in various high-impact applications, including fraud detection and content recommendation. Despite the success of TGNNs, they are prone to the prevalent noise found in real-world dynamic graphs like time-deprecated links and skewed interaction distribution. The noise causes two critical issues that significantly compromise the accuracy of TGNNs: (1) models are supervised by inferior interactions, and (2) noisy input induces high variance in the aggregated messages. However, current TGNN denoising techniques do not consider the diverse and dynamic noise pattern of each node. In addition, they also suffer from the excessive mini-batch generation overheads caused by traversing more neighbors. We believe the remedy for fast and accurate TGNNs lies in temporal adaptive sampling. In this work, we propose TASER, the first adaptive sampling method for TGNNs optimized for accuracy, efficiency, and scalability. TASER adapts its mini-batch selection based on training dynamics and temporal neighbor selection based on the contextual, structural, and temporal properties of past interactions. To alleviate the bottleneck in mini-batch generation, TASER implements a pure GPU-based temporal neighbor finder and a dedicated GPU feature cache. We evaluate the performance of TASER using two state-of-the-art backbone TGNNs. On five popular datasets, TASER outperforms the corresponding baselines by an average of 2.3% in Mean Reciprocal Rank (MRR) while achieving an average of 5.1x speedup in training time.
SpirDet: Towards Efficient, Accurate and Lightweight Infrared Small Target Detector
Authors: Authors: Qianchen Mao, Qiang Li, Bingshu Wang, Yongjun Zhang, Tao Dai, C.L. Philip Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In recent years, the detection of infrared small targets using deep learning methods has garnered substantial attention due to notable advancements. To improve the detection capability of small targets, these methods commonly maintain a pathway that preserves high-resolution features of sparse and tiny targets. However, it can result in redundant and expensive computations. To tackle this challenge, we propose SpirDet, a novel approach for efficient detection of infrared small targets. Specifically, to cope with the computational redundancy issue, we employ a new dual-branch sparse decoder to restore the feature map. Firstly, the fast branch directly predicts a sparse map indicating potential small target locations (occupying only 0.5\% area of the map). Secondly, the slow branch conducts fine-grained adjustments at the positions indicated by the sparse map. Additionally, we design an lightweight DO-RepEncoder based on reparameterization with the Downsampling Orthogonality, which can effectively reduce memory consumption and inference latency. Extensive experiments show that the proposed SpirDet significantly outperforms state-of-the-art models while achieving faster inference speed and fewer parameters. For example, on the IRSTD-1K dataset, SpirDet improves $MIoU$ by 4.7 and has a $7\times$ $FPS$ acceleration compared to the previous state-of-the-art model. The code will be open to the public.
Segmentation-free Connectionist Temporal Classification loss based OCR Model for Text Captcha Classification
Abstract
Captcha are widely used to secure systems from automatic responses by distinguishing computer responses from human responses. Text, audio, video, picture picture-based Optical Character Recognition (OCR) are used for creating captcha. Text-based OCR captcha are the most often used captcha which faces issues namely, complex and distorted contents. There are attempts to build captcha detection and classification-based systems using machine learning and neural networks, which need to be tuned for accuracy. The existing systems face challenges in the recognition of distorted characters, handling variable-length captcha and finding sequential dependencies in captcha. In this work, we propose a segmentation-free OCR model for text captcha classification based on the connectionist temporal classification loss technique. The proposed model is trained and tested on a publicly available captcha dataset. The proposed model gives 99.80\% character level accuracy, while 95\% word level accuracy. The accuracy of the proposed model is compared with the state-of-the-art models and proves to be effective. The variable length complex captcha can be thus processed with the segmentation-free connectionist temporal classification loss technique with dependencies which will be massively used in securing the software systems.
Optimizing Visibility-based Search in Polygonal Domains
Authors: Authors: Kien C. Huynh, Joseph S. B. Mitchell, Linh Nguyen, Valentin Polishchuk
Abstract
Given a geometric domain $P$, visibility-based search problems seek routes for one or more mobile agents (``watchmen'') to move within $P$ in order to be able to see a portion (or all) of $P$, while optimizing objectives, such as the length(s) of the route(s), the size (e.g., area or volume) of the portion seen, the probability of detecting a target distributed within $P$ according to a prior distribution, etc. The classic watchman route problem seeks a shortest route for an observer, with omnidirectional vision, to see all of $P$. In this paper we study bicriteria optimization problems for a single mobile agent within a polygonal domain $P$ in the plane, with the criteria of route length and area seen. Specifically, we address the problem of computing a minimum length route that sees at least a specified area of $P$ (minimum length, for a given area quota). We also study the problem of computing a length-constrained route that sees as much area as possible. We provide hardness results and approximation algorithms. In particular, for a simple polygon $P$ we provide the first fully polynomial-time approximation scheme for the problem of computing a shortest route seeing an area quota, as well as a (slightly more efficient) polynomial dual approximation. We also consider polygonal domains $P$ (with holes) and the special case of a planar domain consisting of a union of lines. Our results yield the first approximation algorithms for computing a time-optimal search route in $P$ to guarantee some specified probability of detection of a static target within $P$, randomly distributed in $P$ according to a given prior distribution.
Low-degree phase transitions for detecting a planted clique in sublinear time
Authors: Authors: Jay Mardia, Kabir Aladin Verchand, Alexander S. Wein
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Machine Learning (stat.ML)
Abstract
We consider the problem of detecting a planted clique of size $k$ in a random graph on $n$ vertices. When the size of the clique exceeds $\Theta(\sqrt{n})$, polynomial-time algorithms for detection proliferate. We study faster -- namely, sublinear time -- algorithms in the high-signal regime when $k = \Theta(n^{1/2 + \delta})$, for some $\delta > 0$. To this end, we consider algorithms that non-adaptively query a subset $M$ of entries of the adjacency matrix and then compute a low-degree polynomial function of the revealed entries. We prove a computational phase transition for this class of non-adaptive low-degree algorithms: under the scaling $\lvert M \rvert = \Theta(n^{\gamma})$, the clique can be detected when $\gamma > 3(1/2 - \delta)$ but not when $\gamma < 3(1/2 - \delta)$. As a result, the best known runtime for detecting a planted clique, $\widetilde{O}(n^{3(1/2-\delta)})$, cannot be improved without looking beyond the non-adaptive low-degree class. Our proof of the lower bound -- based on bounding the conditional low-degree likelihood ratio -- reveals further structure in non-adaptive detection of a planted clique. Using (a bound on) the conditional low-degree likelihood ratio as a potential function, we show that for every non-adaptive query pattern, there is a highly structured query pattern of the same size that is at least as effective.
Rapid Optimization for Jailbreaking LLMs via Subconscious Exploitation and Echopraxia
Authors: Authors: Guangyu Shen, Siyuan Cheng, Kaiyuan Zhang, Guanhong Tao, Shengwei An, Lu Yan, Zhuo Zhang, Shiqing Ma, Xiangyu Zhang
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
Abstract
Large Language Models (LLMs) have become prevalent across diverse sectors, transforming human life with their extraordinary reasoning and comprehension abilities. As they find increased use in sensitive tasks, safety concerns have gained widespread attention. Extensive efforts have been dedicated to aligning LLMs with human moral principles to ensure their safe deployment. Despite their potential, recent research indicates aligned LLMs are prone to specialized jailbreaking prompts that bypass safety measures to elicit violent and harmful content. The intrinsic discrete nature and substantial scale of contemporary LLMs pose significant challenges in automatically generating diverse, efficient, and potent jailbreaking prompts, representing a continuous obstacle. In this paper, we introduce RIPPLE (Rapid Optimization via Subconscious Exploitation and Echopraxia), a novel optimization-based method inspired by two psychological concepts: subconsciousness and echopraxia, which describe the processes of the mind that occur without conscious awareness and the involuntary mimicry of actions, respectively. Evaluations across 6 open-source LLMs and 4 commercial LLM APIs show RIPPLE achieves an average Attack Success Rate of 91.5\%, outperforming five current methods by up to 47.0\% with an 8x reduction in overhead. Furthermore, it displays significant transferability and stealth, successfully evading established detection mechanisms. The code of our work is available at \url{https://github.com/SolidShen/RIPPLE_official/tree/official}
Heart disease risk prediction using deep learning techniques with feature augmentation
Authors: Authors: María Teresa García-Ordás, Martín Bayón-Gutiérrez, Carmen Benavides, Jose Aveleira-Mata, José Alberto Benítez-Andrades
Abstract
Cardiovascular diseases state as one of the greatest risks of death for the general population. Late detection in heart diseases highly conditions the chances of survival for patients. Age, sex, cholesterol level, sugar level, heart rate, among other factors, are known to have an influence on life-threatening heart problems, but, due to the high amount of variables, it is often difficult for an expert to evaluate each patient taking this information into account. In this manuscript, the authors propose using deep learning methods, combined with feature augmentation techniques for evaluating whether patients are at risk of suffering cardiovascular disease. The results of the proposed methods outperform other state of the art methods by 4.4%, leading to a precision of a 90%, which presents a significant improvement, even more so when it comes to an affliction that affects a large population.
Empowering machine learning models with contextual knowledge for enhancing the detection of eating disorders in social media posts
Authors: Authors: José Alberto Benítez-Andrades, María Teresa García-Ordás, Mayra Russo, Ahmad Sakor, Luis Daniel Fernandes Rotger, Maria-Esther Vidal
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Abstract
Social networks are vital for information sharing, especially in the health sector for discussing diseases and treatments. These platforms, however, often feature posts as brief texts, posing challenges for Artificial Intelligence (AI) in understanding context. We introduce a novel hybrid approach combining community-maintained knowledge graphs (like Wikidata) with deep learning to enhance the categorization of social media posts. This method uses advanced entity recognizers and linkers (like Falcon 2.0) to connect short post entities to knowledge graphs. Knowledge graph embeddings (KGEs) and contextualized word embeddings (like BERT) are then employed to create rich, context-based representations of these posts. Our focus is on the health domain, particularly in identifying posts related to eating disorders (e.g., anorexia, bulimia) to aid healthcare providers in early diagnosis. We tested our approach on a dataset of 2,000 tweets about eating disorders, finding that merging word embeddings with knowledge graph information enhances the predictive models' reliability. This methodology aims to assist health experts in spotting patterns indicative of mental disorders, thereby improving early detection and accurate diagnosis for personalized medicine.
Listening Between the Lines: Synthetic Speech Detection Disregarding Verbal Content
Abstract
Recent advancements in synthetic speech generation have led to the creation of forged audio data that are almost indistinguishable from real speech. This phenomenon poses a new challenge for the multimedia forensics community, as the misuse of synthetic media can potentially cause adverse consequences. Several methods have been proposed in the literature to mitigate potential risks and detect synthetic speech, mainly focusing on the analysis of the speech itself. However, recent studies have revealed that the most crucial frequency bands for detection lie in the highest ranges (above 6000 Hz), which do not include any speech content. In this work, we extensively explore this aspect and investigate whether synthetic speech detection can be performed by focusing only on the background component of the signal while disregarding its verbal content. Our findings indicate that the speech component is not the predominant factor in performing synthetic speech detection. These insights provide valuable guidance for the development of new synthetic speech detectors and their interpretability, together with some considerations on the existing work in the audio forensics field.
Efficient Models for the Detection of Hate, Abuse and Profanity
Authors: Authors: Christoph Tillmann, Aashka Trivedi, Bishwaranjan Bhattacharjee
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract
Large Language Models (LLMs) are the cornerstone for many Natural Language Processing (NLP) tasks like sentiment analysis, document classification, named entity recognition, question answering, summarization, etc. LLMs are often trained on data which originates from the web. This data is prone to having content with Hate, Abuse and Profanity (HAP). For a detailed definition of HAP, please refer to the Appendix. Due to the LLMs being exposed to HAP content during training, the models learn it and may then generate hateful or profane content. For example, when the open-source RoBERTa model (specifically, the RoBERTA base model) from the HuggingFace (HF) Transformers library is prompted to replace the mask token in I do not know that Persian people are that MASK it returns the word stupid with the highest score. This is unacceptable in civil discourse.The detection of Hate, Abuse and Profanity in text is a vital component of creating civil and unbiased LLMs, which is needed not only for English, but for all languages. In this article, we briefly describe the creation of HAP detectors and various ways of using them to make models civil and acceptable in the output they generate.
Assessment of the Sparsity-Diversity Trade-offs in Active Users Detection for mMTC
Authors: Authors: Gabriel Martins de Jesus, Onel Luis Alcaraz Lopez, Richard Demo Souza, Nurul Huda Mahmood, Markku Juntti, Matti Latva-Aho
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
Wireless communication systems must increasingly support a multitude of machine-type communications (MTC) devices, thus calling for advanced strategies for active user detection (AUD). Recent literature has delved into AUD techniques based on compressed sensing, highlighting the critical role of signal sparsity. This study investigates the relationship between frequency diversity and signal sparsity in the AUD problem. Single-antenna users transmit multiple copies of non-orthogonal pilots across multiple frequency channels and the base station independently performs AUD in each channel using the orthogonal matching pursuit algorithm. We note that, although frequency diversity may improve the likelihood of successful reception of the signals, it may also damage the channel sparsity level, leading to important trade-offs. We show that a sparser signal significantly benefits AUD, surpassing the advantages brought by frequency diversity in scenarios with limited temporal resources and/or high numbers of receive antennas. Conversely, with longer pilots and fewer receive antennas, investing in frequency diversity becomes more impactful, resulting in a tenfold AUD performance improvement.
Evolving AI for Wellness: Dynamic and Personalized Real-time Loneliness Detection Using Passive Sensing
Authors: Authors: Malik Muhammad Qirtas, Evi Zafeiridi, Eleanor Bantry White, Dirk Pesch
Abstract
Loneliness is a growing health concern as it can lead to depression and other associated mental health problems for people who experience feelings of loneliness over prolonged periods of time. Utilizing passive sensing methods that use smartphone and wearable sensor data to capture daily behavioural patterns offers a promising approach for the early detection of loneliness. Given the subjective nature of loneliness and people's varying daily routines, past detection approaches using machine learning models often face challenges with effectively detecting loneliness. This paper proposes a methodologically novel approach, particularly developing a loneliness detection system that evolves over time, adapts to new data, and provides real-time detection. Our study utilized the Globem dataset, a comprehensive collection of passive sensing data acquired over 10 weeks from university students. The base of our approach is the continuous identification and refinement of similar behavioural groups among students using an incremental clustering method. As we add new data, the model improves based on changing behavioural patterns. Parallel to this, we create and update classification models to detect loneliness among the evolving behavioural groups of students. When unique behavioural patterns are observed among student data, specialized classification models have been created. For predictions of loneliness, a collaborative effort between the generalized and specialized models is employed, treating each prediction as a vote. This study's findings reveal that group-based loneliness detection models exhibit superior performance compared to generic models, underscoring the necessity for more personalized approaches tailored to specific behavioural patterns. These results pave the way for future research, emphasizing the development of finely-tuned, individualized mental health interventions.
Triangular phase-shift detector for drone precise vertical landing RF systems
Authors: Authors: Víctor Araña-Pulido, Eugenio Jiménez-Yguácel, Francisco Cabrera-Almeida, Pedro Quintana-Morales
Abstract
This paper presents a circuit for precise vertical landing of drones based on a three phase-shifts detection of a single frequency transmitted from the landing point. The circuit can be considered as a new navigation sensor that assists in guidance corrections for landing at a specific point. The circuit has three inputs to which the signal transmitted from an oscillator located at the landing point arrives with different delays. The input signals are combined in pairs in each of the three analog phase detectors, after having passed through 3 dB@90 o hybrid couplers that guarantee a theoretical non-ambiguous phase-shift range of +-90 degree. Each output has a voltage that is proportional to the phase-shift between each of the input signals, which in turn depend on the position relative to the landing point. A simple landing algorithm based on phase-shift values is proposed, which could be integrated into the same flight control platform, thus avoiding the need to add additional processing components. To demonstrate the feasibility of the proposed design, a triangular phase-shift detector prototype has been implemented using commercial devices. Calibration and measurements at 2.46 GHz show a dynamic range of 30 dB and a non-ambiguous detection range of +-80 degree in the worst cases. Those specs let us to track the drone during the landing maneuver in an inverted cone formed by a surface with a +-4.19 m radius at 10m high and the landing point.
InstaGen: Enhancing Object Detection by Training on Synthetic Dataset
Authors: Authors: Chengjian Feng, Yujie Zhong, Zequn Jie, Weidi Xie, Lin Ma
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In this paper, we introduce a novel paradigm to enhance the ability of object detector, e.g., expanding categories or improving detection performance, by training on synthetic dataset generated from diffusion models. Specifically, we integrate an instance-level grounding head into a pre-trained, generative diffusion model, to augment it with the ability of localising arbitrary instances in the generated images. The grounding head is trained to align the text embedding of category names with the regional visual feature of the diffusion model, using supervision from an off-the-shelf object detector, and a novel self-training scheme on (novel) categories not covered by the detector. This enhanced version of diffusion model, termed as InstaGen, can serve as a data synthesizer for object detection. We conduct thorough experiments to show that, object detector can be enhanced while training on the synthetic dataset from InstaGen, demonstrating superior performance over existing state-of-the-art methods in open-vocabulary (+4.5 AP) and data-sparse (+1.2 to 5.2 AP) scenarios.
Keyword: face recognition
Efficient Expression Neutrality Estimation with Application to Face Recognition Utility Prediction
Authors: Authors: Marcel Grimmer, Raymond N. J. Veldhuis, Christoph Busch
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
Abstract
The recognition performance of biometric systems strongly depends on the quality of the compared biometric samples. Motivated by the goal of establishing a common understanding of face image quality and enabling system interoperability, the committee draft of ISO/IEC 29794-5 introduces expression neutrality as one of many component quality elements affecting recognition performance. In this study, we train classifiers to assess facial expression neutrality using seven datasets. We conduct extensive performance benchmarking to evaluate their classification and face recognition utility prediction abilities. Our experiments reveal significant differences in how each classifier distinguishes "neutral" from "non-neutral" expressions. While Random Forests and AdaBoost classifiers are most suitable for distinguishing neutral from non-neutral facial expressions with high accuracy, they underperform compared to Support Vector Machines in predicting face recognition utility.
A Framework for Assessing Proportionate Intervention with Face Recognition Systems in Real-Life Scenarios
Authors: Authors: Pablo Negri, Isabelle Hupont, Emilia Gomez
Abstract
Face recognition (FR) has reached a high technical maturity. However, its use needs to be carefully assessed from an ethical perspective, especially in sensitive scenarios. This is precisely the focus of this paper: the use of FR for the identification of specific subjects in moderately to densely crowded spaces (e.g. public spaces, sports stadiums, train stations) and law enforcement scenarios. In particular, there is a need to consider the trade-off between the need to protect privacy and fundamental rights of citizens as well as their safety. Recent Artificial Intelligence (AI) policies, notably the European AI Act, propose that such FR interventions should be proportionate and deployed only when strictly necessary. Nevertheless, concrete guidelines on how to address the concept of proportional FR intervention are lacking to date. This paper proposes a framework to contribute to assessing whether an FR intervention is proportionate or not for a given context of use in the above mentioned scenarios. It also identifies the main quantitative and qualitative variables relevant to the FR intervention decision (e.g. number of people in the scene, level of harm that the person(s) in search could perpetrate, consequences to individual rights and freedoms) and propose a 2D graphical model making it possible to balance these variables in terms of ethical cost vs security gain. Finally, different FR scenarios inspired by real-world deployments validate the proposed model. The framework is conceived as a simple support tool for decision makers when confronted with the deployment of an FR system.
Keyword: augmentation
Heart disease risk prediction using deep learning techniques with feature augmentation
Authors: Authors: María Teresa García-Ordás, Martín Bayón-Gutiérrez, Carmen Benavides, Jose Aveleira-Mata, José Alberto Benítez-Andrades
Abstract
Cardiovascular diseases state as one of the greatest risks of death for the general population. Late detection in heart diseases highly conditions the chances of survival for patients. Age, sex, cholesterol level, sugar level, heart rate, among other factors, are known to have an influence on life-threatening heart problems, but, due to the high amount of variables, it is often difficult for an expert to evaluate each patient taking this information into account. In this manuscript, the authors propose using deep learning methods, combined with feature augmentation techniques for evaluating whether patients are at risk of suffering cardiovascular disease. The results of the proposed methods outperform other state of the art methods by 4.4%, leading to a precision of a 90%, which presents a significant improvement, even more so when it comes to an affliction that affects a large population.
AutoAugment Is What You Need: Enhancing Rule-based Augmentation Methods in Low-resource Regimes
Authors: Authors: Juhwan Choi, Kyohoon Jin, Junho Lee, Sangmin Song, Youngbin Kim
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
Text data augmentation is a complex problem due to the discrete nature of sentences. Although rule-based augmentation methods are widely adopted in real-world applications because of their simplicity, they suffer from potential semantic damage. Previous researchers have suggested easy data augmentation with soft labels (softEDA), employing label smoothing to mitigate this problem. However, finding the best factor for each model and dataset is challenging; therefore, using softEDA in real-world applications is still difficult. In this paper, we propose adapting AutoAugment to solve this problem. The experimental results suggest that the proposed method can boost existing augmentation methods and that rule-based methods can enhance cutting-edge pre-trained language models. We offer the source code.
RESMatch: Referring Expression Segmentation in a Semi-Supervised Manner
Abstract
Referring expression segmentation (RES), a task that involves localizing specific instance-level objects based on free-form linguistic descriptions, has emerged as a crucial frontier in human-AI interaction. It demands an intricate understanding of both visual and textual contexts and often requires extensive training data. This paper introduces RESMatch, the first semi-supervised learning (SSL) approach for RES, aimed at reducing reliance on exhaustive data annotation. Extensive validation on multiple RES datasets demonstrates that RESMatch significantly outperforms baseline approaches, establishing a new state-of-the-art. Although existing SSL techniques are effective in image segmentation, we find that they fall short in RES. Facing the challenges including the comprehension of free-form linguistic descriptions and the variability in object attributes, RESMatch introduces a trifecta of adaptations: revised strong perturbation, text augmentation, and adjustments for pseudo-label quality and strong-weak supervision. This pioneering work lays the groundwork for future research in semi-supervised learning for referring expression segmentation.
SoftEDA: Rethinking Rule-Based Data Augmentation with Soft Labels
Authors: Authors: Juhwan Choi, Kyohoon Jin, Junho Lee, Sangmin Song, Youngbin Kim
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
Rule-based text data augmentation is widely used for NLP tasks due to its simplicity. However, this method can potentially damage the original meaning of the text, ultimately hurting the performance of the model. To overcome this limitation, we propose a straightforward technique for applying soft labels to augmented data. We conducted experiments across seven different classification tasks and empirically demonstrated the effectiveness of our proposed approach. We have publicly opened our source code for reproducibility.
Scalable Diffusion Models with State Space Backbone
Abstract
This paper presents a new exploration into a category of diffusion models built upon state space architecture. We endeavor to train diffusion models for image data, wherein the traditional U-Net backbone is supplanted by a state space backbone, functioning on raw patches or latent space. Given its notable efficacy in accommodating long-range dependencies, Diffusion State Space Models (DiS) are distinguished by treating all inputs including time, condition, and noisy image patches as tokens. Our assessment of DiS encompasses both unconditional and class-conditional image generation scenarios, revealing that DiS exhibits comparable, if not superior, performance to CNN-based or Transformer-based U-Net architectures of commensurate size. Furthermore, we analyze the scalability of DiS, gauged by the forward pass complexity quantified in Gflops. DiS models with higher Gflops, achieved through augmentation of depth/width or augmentation of input tokens, consistently demonstrate lower FID. In addition to demonstrating commendable scalability characteristics, DiS-H/2 models in latent space achieve performance levels akin to prior diffusion models on class-conditional ImageNet benchmarks at the resolution of 256$\times$256 and 512$\times$512, while significantly reducing the computational burden. The code and models are available at: https://github.com/feizc/DiS.
Abstract
Recent advancements in meta-learning have enabled the automatic discovery of novel reinforcement learning algorithms parameterized by surrogate objective functions. To improve upon manually designed algorithms, the parameterization of this learned objective function must be expressive enough to represent novel principles of learning (instead of merely recovering already established ones) while still generalizing to a wide range of settings outside of its meta-training distribution. However, existing methods focus on discovering objective functions that, like many widely used objective functions in reinforcement learning, do not take into account the total number of steps allowed for training, or "training horizon". In contrast, humans use a plethora of different learning objectives across the course of acquiring a new ability. For instance, students may alter their studying techniques based on the proximity to exam deadlines and their self-assessed capabilities. This paper contends that ignoring the optimization time horizon significantly restricts the expressive potential of discovered learning algorithms. We propose a simple augmentation to two existing objective discovery approaches that allows the discovered algorithm to dynamically update its objective function throughout the agent's training procedure, resulting in expressive schedules and increased generalization across different training horizons. In the process, we find that commonly used meta-gradient approaches fail to discover such adaptive objective functions while evolution strategies discover highly dynamic learning rules. We demonstrate the effectiveness of our approach on a wide range of tasks and analyze the resulting learned algorithms, which we find effectively balance exploration and exploitation by modifying the structure of their learning rules throughout the agent's lifetime.
Keyword: detection
A Light-weight and Unsupervised Method for Near Real-time Behavioral Analysis using Operational Data Measurement
Illuminate: A novel approach for depression detection with explainable analysis and proactive therapy using prompt engineering
Enhancement of Bengali OCR by Specialized Models and Advanced Techniques for Diverse Document Types
Scrapping The Web For Early Wildfire Detection
Guiding Large Language Models with Divide-and-Conquer Program for Discerning Problem Solving
Task-customized Masked AutoEncoder via Mixture of Cluster-conditional Experts
TASER: Temporal Adaptive Sampling for Fast and Accurate Dynamic Graph Representation Learning
SpirDet: Towards Efficient, Accurate and Lightweight Infrared Small Target Detector
Segmentation-free Connectionist Temporal Classification loss based OCR Model for Text Captcha Classification
Optimizing Visibility-based Search in Polygonal Domains
Low-degree phase transitions for detecting a planted clique in sublinear time
Rapid Optimization for Jailbreaking LLMs via Subconscious Exploitation and Echopraxia
Heart disease risk prediction using deep learning techniques with feature augmentation
Empowering machine learning models with contextual knowledge for enhancing the detection of eating disorders in social media posts
Listening Between the Lines: Synthetic Speech Detection Disregarding Verbal Content
Efficient Models for the Detection of Hate, Abuse and Profanity
I do not know that Persian people are that MASK
it returns the wordstupid
with the highest score. This is unacceptable in civil discourse.The detection of Hate, Abuse and Profanity in text is a vital component of creating civil and unbiased LLMs, which is needed not only for English, but for all languages. In this article, we briefly describe the creation of HAP detectors and various ways of using them to make models civil and acceptable in the output they generate.Assessment of the Sparsity-Diversity Trade-offs in Active Users Detection for mMTC
Evolving AI for Wellness: Dynamic and Personalized Real-time Loneliness Detection Using Passive Sensing
Triangular phase-shift detector for drone precise vertical landing RF systems
InstaGen: Enhancing Object Detection by Training on Synthetic Dataset
Keyword: face recognition
Efficient Expression Neutrality Estimation with Application to Face Recognition Utility Prediction
A Framework for Assessing Proportionate Intervention with Face Recognition Systems in Real-Life Scenarios
Keyword: augmentation
Heart disease risk prediction using deep learning techniques with feature augmentation
AutoAugment Is What You Need: Enhancing Rule-based Augmentation Methods in Low-resource Regimes
RESMatch: Referring Expression Segmentation in a Semi-Supervised Manner
SoftEDA: Rethinking Rule-Based Data Augmentation with Soft Labels
Scalable Diffusion Models with State Space Backbone
Discovering Temporally-Aware Reinforcement Learning Algorithms