New submissions for Mon, 8 May 23

Keyword: text generation

Stylized Data-to-Text Generation: A Case Study in the E-Commerce Domain

Authors: Liqiang Jing, Xuemeng Song, Xuming Lin, Zhongzhou Zhao, Wei Zhou, Liqiang Nie
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2305.03256
Pdf link: https://arxiv.org/pdf/2305.03256
Abstract Existing data-to-text generation efforts mainly focus on generating a coherent text from non-linguistic input data, such as tables and attribute-value pairs, but overlook that different application scenarios may require texts of different styles. Inspired by this, we define a new task, namely stylized data-to-text generation, whose aim is to generate coherent text for the given non-linguistic data according to a specific style. This task is non-trivial, due to three challenges: the logic of the generated text, unstructured style reference, and biased training samples. To address these challenges, we propose a novel stylized data-to-text generation model, named StyleD2T, comprising three components: logic planning-enhanced data embedding, mask-based style embedding, and unbiased stylized text generation. In the first component, we introduce a graph-guided logic planner for attribute organization to ensure the logic of generated text. In the second component, we devise feature-level mask-based style embedding to extract the essential style signal from the given unstructured style reference. In the last one, pseudo triplet augmentation is utilized to achieve unbiased text generation, and a multi-condition based confidence assignment function is designed to ensure the quality of pseudo samples. Extensive experiments on a newly collected dataset from Taobao have been conducted, and the results show the superiority of our model over existing methods.
Expository Text Generation: Imitate, Retrieve, Paraphrase
Authors: Nishant Balepur, Jie Huang, Kevin Chen-Chuan Chang
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2305.03276
Pdf link: https://arxiv.org/pdf/2305.03276
Abstract Expository documents are vital resources for conveying complex information to readers. Despite their usefulness, writing expository documents by hand is a time-consuming and labor-intensive process that requires knowledge of the domain of interest, careful content planning, and the ability to synthesize information from multiple sources. To ease these burdens, we introduce the task of expository text generation, which seeks to automatically generate an accurate and informative expository document from a knowledge source. We solve our task by developing IRP, an iterative framework that overcomes the limitations of language models and separately tackles the steps of content planning, fact selection, and rephrasing. Through experiments on three diverse datasets, we demonstrate that IRP produces high-quality expository documents that accurately inform readers.
Keyword: machine translation

Investigating Lexical Sharing in Multilingual Machine Translation for Indian Languages
Authors: Sonal Sannigrahi, Rachel Bawden
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2305.03207
Pdf link: https://arxiv.org/pdf/2305.03207
Abstract Multilingual language models have shown impressive cross-lingual transfer ability across a diverse set of languages and tasks. To improve the cross-lingual ability of these models, some strategies include transliteration and finer-grained segmentation into characters as opposed to subwords. In this work, we investigate lexical sharing in multilingual machine translation (MT) from Hindi, Gujarati, Nepali into English. We explore the trade-offs that exist in translation performance between data sampling and vocabulary size, and we explore whether transliteration is useful in encouraging cross-script generalisation. We also verify how the different settings generalise to unseen languages (Marathi and Bengali). We find that transliteration does not give pronounced improvements and our analysis suggests that our multilingual MT models trained on original scripts seem to already be robust to cross-script differences even for relatively low-resource languages
Shared Latent Space by Both Languages in Non-Autoregressive Neural Machine Translation
Authors: DongNyeong Heo, Heeyoul Choi
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.03511
Pdf link: https://arxiv.org/pdf/2305.03511
Abstract Latent variable modeling in non-autoregressive neural machine translation (NAT) is a promising approach to mitigate the multimodality problem. In the previous works, they added an auxiliary model to estimate the posterior distribution of the latent variable conditioned on the source and target sentences. However, it causes several disadvantages, such as redundant information extraction in the latent variable, increasing parameters, and a tendency to ignore a part of the information from the inputs. In this paper, we propose a new latent variable modeling that is based on a dual reconstruction perspective and an advanced hierarchical latent modeling approach. Our proposed method, {\em LadderNMT}, shares a latent space across both languages so that it hypothetically alleviates or solves the above disadvantages. Experimental results quantitatively and qualitatively demonstrate that our proposed latent variable modeling learns an advantageous latent space and significantly improves translation quality in WMT translation tasks.
In-context Learning as Maintaining Coherency: A Study of On-the-fly Machine Translation Using Large Language Models
Authors: Suzanna Sia, Kevin Duh
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2305.03573
Pdf link: https://arxiv.org/pdf/2305.03573
Abstract The phenomena of in-context learning has typically been thought of as "learning from examples". In this work which focuses on Machine Translation, we present a perspective of in-context learning as the desired generation task maintaining coherency with its context, i.e., the prompt examples. We first investigate randomly sampled prompts across 4 domains, and find that translation performance improves when shown in-domain prompts. Next, we investigate coherency for the in-domain setting, which uses prompt examples from a moving window. We study this with respect to other factors that have previously been identified in the literature such as length, surface similarity and sentence embedding similarity. Our results across 3 models (GPTNeo2.7B, Bloom3B, XGLM2.9B), and three translation directions (\texttt{en}$\rightarrow${\texttt{pt, de, fr}}) suggest that the long-term coherency of the prompts and the test sentence is a good indicator of downstream translation performance. In doing so, we demonstrate the efficacy of In-context Machine Translation for on-the-fly adaptation.
Keyword: non-autoregressive

Shared Latent Space by Both Languages in Non-Autoregressive Neural Machine Translation
Authors: DongNyeong Heo, Heeyoul Choi
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.03511
Pdf link: https://arxiv.org/pdf/2305.03511
Abstract Latent variable modeling in non-autoregressive neural machine translation (NAT) is a promising approach to mitigate the multimodality problem. In the previous works, they added an auxiliary model to estimate the posterior distribution of the latent variable conditioned on the source and target sentences. However, it causes several disadvantages, such as redundant information extraction in the latent variable, increasing parameters, and a tendency to ignore a part of the information from the inputs. In this paper, we propose a new latent variable modeling that is based on a dual reconstruction perspective and an advanced hierarchical latent modeling approach. Our proposed method, {\em LadderNMT}, shares a latent space across both languages so that it hypothetically alleviates or solves the above disadvantages. Experimental results quantitatively and qualitatively demonstrate that our proposed latent variable modeling learns an advantageous latent space and significantly improves translation quality in WMT translation tasks.
Keyword: abstractive summarization

There is no result

Keyword: factual

Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework
Authors: Ruochen Zhao, Xingxuan Li, Shafiq Joty, Chengwei Qin, Lidong Bing
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2305.03268
Pdf link: https://arxiv.org/pdf/2305.03268
Abstract As large language models (LLMs) have become the norm in NLP, demonstrating good performance in generation and reasoning tasks, one of its most fatal disadvantages is the lack of factual correctness. Generating unfactual texts not only leads to lower performances but also degrades the trust and validity of their applications. Chain-of-Thought (CoT) prompting improves trust and model performance on complex reasoning tasks by generating interpretable reasoning chains, but still suffers from factuality concerns in knowledge-intensive tasks. In this paper, we propose the Verify-and-Edit framework for CoT prompting, which seeks to increase prediction factuality by post-editing reasoning chains according to external knowledge. Building on top of GPT-3, our framework lead to accuracy improvements in multiple open-domain question-answering tasks.
Think Rationally about What You See: Continuous Rationale Extraction for Relation Extraction
Authors: Xuming Hu, Zhaochen Hong, Chenwei Zhang, Irwin King, Philip S. Yu
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2305.03503
Pdf link: https://arxiv.org/pdf/2305.03503
Abstract Relation extraction (RE) aims to extract potential relations according to the context of two entities, thus, deriving rational contexts from sentences plays an important role. Previous works either focus on how to leverage the entity information (e.g., entity types, entity verbalization) to inference relations, but ignore context-focused content, or use counterfactual thinking to remove the model's bias of potential relations in entities, but the relation reasoning process will still be hindered by irrelevant content. Therefore, how to preserve relevant content and remove noisy segments from sentences is a crucial task. In addition, retained content needs to be fluent enough to maintain semantic coherence and interpretability. In this work, we propose a novel rationale extraction framework named RE2, which leverages two continuity and sparsity factors to obtain relevant and coherent rationales from sentences. To solve the problem that the gold rationales are not labeled, RE2 applies an optimizable binary mask to each token in the sentence, and adjust the rationales that need to be selected according to the relation label. Experiments on four datasets show that RE2 surpasses baselines.
Read it Twice: Towards Faithfully Interpretable Fact Verification by Revisiting Evidence
Authors: Xuming Hu, Zhaochen Hong, Zhijiang Guo, Lijie Wen, Philip S. Yu
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2305.03507
Pdf link: https://arxiv.org/pdf/2305.03507
Abstract Real-world fact verification task aims to verify the factuality of a claim by retrieving evidence from the source document. The quality of the retrieved evidence plays an important role in claim verification. Ideally, the retrieved evidence should be faithful (reflecting the model's decision-making process in claim verification) and plausible (convincing to humans), and can improve the accuracy of verification task. Although existing approaches leverage the similarity measure of semantic or surface form between claims and documents to retrieve evidence, they all rely on certain heuristics that prevent them from satisfying all three requirements. In light of this, we propose a fact verification model named ReRead to retrieve evidence and verify claim that: (1) Train the evidence retriever to obtain interpretable evidence (i.e., faithfulness and plausibility criteria); (2) Train the claim verifier to revisit the evidence retrieved by the optimized evidence retriever to improve the accuracy. The proposed system is able to achieve significant improvements upon best-reported models under different settings.
Keyword: knowledge distillation

There is no result

Keyword: Hallucination

There is no result

Keyword: evaluation

Modeling What-to-ask and How-to-ask for Answer-unaware Conversational Question Generation
Authors: Xuan Long Do, Bowei Zou, Shafiq Joty, Anh Tai Tran, Liangming Pan, Nancy F. Chen, Ai Ti Aw
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2305.03088
Pdf link: https://arxiv.org/pdf/2305.03088
Abstract Conversational Question Generation (CQG) is a critical task for machines to assist humans in fulfilling their information needs through conversations. The task is generally cast into two different settings: answer-aware and answer-unaware. While the former facilitates the models by exposing the expected answer, the latter is more realistic and receiving growing attentions recently. What-to-ask and how-to-ask are the two main challenges in the answer-unaware setting. To address the first challenge, existing methods mainly select sequential sentences in context as the rationales. We argue that the conversation generated using such naive heuristics may not be natural enough as in reality, the interlocutors often talk about the relevant contents that are not necessarily sequential in context. Additionally, previous methods decide the type of question to be generated (boolean/span-based) implicitly. Modeling the question type explicitly is crucial as the answer, which hints the models to generate a boolean or span-based question, is unavailable. To this end, we present SG-CQG, a two-stage CQG framework. For the what-to-ask stage, a sentence is selected as the rationale from a semantic graph that we construct, and extract the answer span from it. For the how-to-ask stage, a classifier determines the target answer type of the question via two explicit control signals before generating and filtering. In addition, we propose Conv-Distinct, a novel evaluation metric for CQG, to evaluate the diversity of the generated conversation from a context. Compared with the existing answer-unaware CQG models, the proposed SG-CQG achieves state-of-the-art performance.
Chain-of-Skills: A Configurable Model for Open-domain Question Answering
Authors: Kaixin Ma, Hao Cheng, Yu Zhang, Xiaodong Liu, Eric Nyberg, Jianfeng Gao
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2305.03130
Pdf link: https://arxiv.org/pdf/2305.03130
Abstract The retrieval model is an indispensable component for real-world knowledge-intensive tasks, e.g., open-domain question answering (ODQA). As separate retrieval skills are annotated for different datasets, recent work focuses on customized methods, limiting the model transferability and scalability. In this work, we propose a modular retriever where individual modules correspond to key skills that can be reused across datasets. Our approach supports flexible skill configurations based on the target domain to boost performance. To mitigate task interference, we design a novel modularization parameterization inspired by sparse Transformer. We demonstrate that our model can benefit from self-supervised pretraining on Wikipedia and fine-tuning using multiple ODQA datasets, both in a multi-task fashion. Our approach outperforms recent self-supervised retrievers in zero-shot evaluations and achieves state-of-the-art fine-tuned retrieval performance on NQ, HotpotQA and OTT-QA.
TransESC: Smoothing Emotional Support Conversation via Turn-Level State Transition
Authors: Weixiang Zhao, Yanyan Zhao, Shilong Wang, Bing Qin
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2305.03296
Pdf link: https://arxiv.org/pdf/2305.03296
Abstract Emotion Support Conversation (ESC) is an emerging and challenging task with the goal of reducing the emotional distress of people. Previous attempts fail to maintain smooth transitions between utterances in ESC because they ignore to grasp the fine-grained transition information at each dialogue turn. To solve this problem, we propose to take into account turn-level state \textbf{Trans}itions of \textbf{ESC} (\textbf{TransESC}) from three perspectives, including semantics transition, strategy transition and emotion transition, to drive the conversation in a smooth and natural way. Specifically, we construct the state transition graph with a two-step way, named transit-then-interact, to grasp such three types of turn-level transition information. Finally, they are injected into the transition-aware decoder to generate more engaging responses. Both automatic and human evaluations on the benchmark dataset demonstrate the superiority of TransESC to generate more smooth and effective supportive responses. Our source code is available at \url{https://github.com/circle-hit/TransESC}.
HiPool: Modeling Long Documents Using Graph Neural Networks
Authors: Irene Li, Aosong Feng, Dragomir Radev, Rex Ying
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2305.03319
Pdf link: https://arxiv.org/pdf/2305.03319
Abstract Encoding long sequences in Natural Language Processing (NLP) is a challenging problem. Though recent pretraining language models achieve satisfying performances in many NLP tasks, they are still restricted by a pre-defined maximum length, making them challenging to be extended to longer sequences. So some recent works utilize hierarchies to model long sequences. However, most of them apply sequential models for upper hierarchies, suffering from long dependency issues. In this paper, we alleviate these issues through a graph-based method. We first chunk the sequence with a fixed length to model the sentence-level information. We then leverage graphs to model intra- and cross-sentence correlations with a new attention mechanism. Additionally, due to limited standard benchmarks for long document classification (LDC), we propose a new challenging benchmark, totaling six datasets with up to 53k samples and 4034 average tokens' length. Evaluation shows our model surpasses competitive baselines by 2.6% in F1 score, and 4.8% on the longest sequence dataset. Our method is shown to outperform hierarchical sequential models with better performance and scalability, especially for longer sequences.
Simulating H.P. Lovecraft horror literature with the ChatGPT large language model
Authors: Eduardo C. Garrido-Merchán, José Luis Arroyo-Barrigüete, Roberto Gozalo-Brihuela
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2305.03429
Pdf link: https://arxiv.org/pdf/2305.03429
Abstract In this paper, we present a novel approach to simulating H.P. Lovecraft's horror literature using the ChatGPT large language model, specifically the GPT-4 architecture. Our study aims to generate text that emulates Lovecraft's unique writing style and themes, while also examining the effectiveness of prompt engineering techniques in guiding the model's output. To achieve this, we curated a prompt containing several specialized literature references and employed advanced prompt engineering methods. We conducted an empirical evaluation of the generated text by administering a survey to a sample of undergraduate students. Utilizing statistical hypothesis testing, we assessed the students ability to distinguish between genuine Lovecraft works and those generated by our model. Our findings demonstrate that the participants were unable to reliably differentiate between the two, indicating the effectiveness of the GPT-4 model and our prompt engineering techniques in emulating Lovecraft's literary style. In addition to presenting the GPT model's capabilities, this paper provides a comprehensive description of its underlying architecture and offers a comparative analysis with related work that simulates other notable authors and philosophers, such as Dennett. By exploring the potential of large language models in the context of literary emulation, our study contributes to the body of research on the applications and limitations of these models in various creative domains.
Building Multimodal AI Chatbots
Authors: Min Young Lee
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2305.03512
Pdf link: https://arxiv.org/pdf/2305.03512
Abstract This work aims to create a multimodal AI system that chats with humans and shares relevant photos. While earlier works were limited to dialogues about specific objects or scenes within images, recent works have incorporated images into open-domain dialogues. However, their response generators are unimodal, accepting text input but no image input, thus prone to generating responses contradictory to the images shared in the dialogue. Therefore, this work proposes a complete chatbot system using two multimodal deep learning models: an image retriever that understands texts and a response generator that understands images. The image retriever, implemented by ViT and BERT, selects the most relevant image given the dialogue history and a database of images. The response generator, implemented by ViT and GPT-2/DialoGPT, generates an appropriate response given the dialogue history and the most recently retrieved image. The two models are trained and evaluated on PhotoChat, an open-domain dialogue dataset in which a photo is shared in each session. In automatic evaluation, the proposed image retriever outperforms existing baselines VSE++ and SCAN with Recall@1/5/10 of 0.1/0.3/0.4 and MRR of 0.2 when ranking 1,000 images. The proposed response generator also surpasses the baseline Divter with PPL of 16.9, BLEU-1/2 of 0.13/0.03, and Distinct-1/2 of 0.97/0.86, showing a significant improvement in PPL by -42.8 and BLEU-1/2 by +0.07/0.02. In human evaluation with a Likert scale of 1-5, the complete multimodal chatbot system receives higher image-groundedness of 4.3 and engagingness of 4.3, along with competitive fluency of 4.1, coherence of 3.9, and humanness of 3.1, when compared to other chatbot variants. The source code is available at: https://github.com/minniie/multimodal_chat.git.
Can Large Language Models Transform Computational Social Science?
Authors: Caleb Ziems, William Held, Omar Shaikh, Jiaao Chen, Zhehao Zhang, Diyi Yang
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.03514
Pdf link: https://arxiv.org/pdf/2305.03514
Abstract Large Language Models (LLMs) like ChatGPT are capable of successfully performing many language processing tasks zero-shot (without the need for training data). If this capacity also applies to the coding of social phenomena like persuasiveness and political ideology, then LLMs could effectively transform Computational Social Science (CSS). This work provides a road map for using LLMs as CSS tools. Towards this end, we contribute a set of prompting best practices and an extensive evaluation pipeline to measure the zero-shot performance of 13 language models on 24 representative CSS benchmarks. On taxonomic labeling tasks (classification), LLMs fail to outperform the best fine-tuned models but still achieve fair levels of agreement with humans. On free-form coding tasks (generation), LLMs produce explanations that often exceed the quality of crowdworkers' gold references. We conclude that today's LLMs can radically augment the CSS research pipeline in two ways: (1) serving as zero-shot data annotators on human annotation teams, and (2) bootstrapping challenging creative generation tasks (e.g., explaining the hidden meaning behind text). In summary, LLMs can significantly reduce costs and increase efficiency of social science analysis in partnership with humans.
Few-shot Domain-Adaptive Visually-fused Event Detection from Text
Authors: Fatemeh Shiri, Farhad Moghimifar, Van Nguyen, Reza Haffari, Yuan-Fang Li
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2305.03517
Pdf link: https://arxiv.org/pdf/2305.03517
Abstract Incorporating auxiliary modalities such as images into event detection models has attracted increasing interest over the last few years. The complexity of natural language in describing situations has motivated researchers to leverage the related visual context to improve event detection performance. However, current approaches in this area suffer from data scarcity, where a large amount of labelled text-image pairs are required for model training. Furthermore, limited access to the visual context at inference time negatively impacts the performance of such models, which makes them practically ineffective in real-world scenarios. In this paper, we present a novel domain-adaptive visually-fused event detection approach that can be trained on a few labelled image-text paired data points. Specifically, we introduce a visual imaginator method that synthesises images from text in the absence of visual context. Moreover, the imaginator can be customised to a specific domain. In doing so, our model can leverage the capabilities of pre-trained vision-language models and can be trained in a few-shot setting. This also allows for effective inference where only single-modality data (i.e. text) is available. The experimental evaluation on the benchmark M2E2 dataset shows that our model outperforms existing state-of-the-art models, by up to 11 points.
Jointly Extracting Interventions, Outcomes, and Findings from RCT Reports with LLMs
Authors: Somin Wadhwa, Jay DeYoung, Benjamin Nye, Silvio Amir, Byron C. Wallace
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2305.03642
Pdf link: https://arxiv.org/pdf/2305.03642
Abstract Results from Randomized Controlled Trials (RCTs) establish the comparative effectiveness of interventions, and are in turn critical inputs for evidence-based care. However, results from RCTs are presented in (often unstructured) natural language articles describing the design, execution, and outcomes of trials; clinicians must manually extract findings pertaining to interventions and outcomes of interest from such articles. This onerous manual process has motivated work on (semi-)automating extraction of structured evidence from trial reports. In this work we propose and evaluate a text-to-text model built on instruction-tuned Large Language Models (LLMs) to jointly extract Interventions, Outcomes, and Comparators (ICO elements) from clinical abstracts, and infer the associated results reported. Manual (expert) and automated evaluations indicate that framing evidence extraction as a conditional generation task and fine-tuning LLMs for this purpose realizes considerable ($\sim$20 point absolute F1 score) gains over the previous SOTA. We perform ablations and error analyses to assess aspects that contribute to model performance, and to highlight potential directions for further improvements. We apply our model to a collection of published RCTs through mid-2022, and release a searchable database of structured findings (anonymously for now): bit.ly/joint-relations-extraction-mlhc

LuckyyySTA / arxiv-daily

New submissions for Mon, 8 May 23 #90

Keyword: text generation

Stylized Data-to-Text Generation: A Case Study in the E-Commerce Domain

Expository Text Generation: Imitate, Retrieve, Paraphrase

Keyword: machine translation

Investigating Lexical Sharing in Multilingual Machine Translation for Indian Languages

Shared Latent Space by Both Languages in Non-Autoregressive Neural Machine Translation

In-context Learning as Maintaining Coherency: A Study of On-the-fly Machine Translation Using Large Language Models

Keyword: non-autoregressive

Shared Latent Space by Both Languages in Non-Autoregressive Neural Machine Translation

Keyword: abstractive summarization

Keyword: factual

Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework

Think Rationally about What You See: Continuous Rationale Extraction for Relation Extraction

Read it Twice: Towards Faithfully Interpretable Fact Verification by Revisiting Evidence

Keyword: knowledge distillation

Keyword: Hallucination

Keyword: evaluation

Modeling What-to-ask and How-to-ask for Answer-unaware Conversational Question Generation

Chain-of-Skills: A Configurable Model for Open-domain Question Answering

TransESC: Smoothing Emotional Support Conversation via Turn-Level State Transition

HiPool: Modeling Long Documents Using Graph Neural Networks

Simulating H.P. Lovecraft horror literature with the ChatGPT large language model

Building Multimodal AI Chatbots

Can Large Language Models Transform Computational Social Science?

Few-shot Domain-Adaptive Visually-fused Event Detection from Text

Jointly Extracting Interventions, Outcomes, and Findings from RCT Reports with LLMs