SakanaAI / AI-Scientist

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑‍🔬
Apache License 2.0
8.19k stars 1.14k forks source link

The process has been stuck at the retrieval phase for about an hour. Is this normal? #116

Open Wuyuhang11 opened 2 months ago

Wuyuhang11 commented 2 months ago

(AI_Scientist) root@intern-studio-50102651:~/AI-Scientist# python launch_scientist.py --model "gpt-4o-2024-05-13" --experiment nanoGPT --num-ideas 1 Using GPUs: [0] Using OpenAI API with model gpt-4o-2024-05-13.

Generating idea 1/1 Iteration 1/3 {'Name': 'mixture_of_experts', 'Title': 'Mixture of Experts in Transformers: Efficiently Scaling Model Capacity', 'Experiment': 'Integrate a mixture of experts (MoE) mechanism into the transformer blocks. Modify the Block class to include multiple experts and a gating network that selects which experts to use for each input. Compare the performance, training speed, and generalization capabilities of the MoE-GPT model with the baseline GPT model on the provided datasets.', 'Interestingness': 8, 'Feasibility': 5, 'Novelty': 7} Iteration 2/3 {'Name': 'simplified_moe', 'Title': 'Simplified Mixture of Experts in Transformers: Efficient Dynamic Computation', 'Experiment': 'Modify the Block class to include multiple expert sub-layers within each transformer block. Implement a gating mechanism that selects one of these sub-layers for each input dynamically. Compare the performance, training speed, and generalization capabilities of the simplified MoE-GPT model with the baseline GPT model on the provided datasets.', 'Interestingness': 7, 'Feasibility': 6, 'Novelty': 6} Iteration 3/3 {'Name': 'simplified_moe', 'Title': 'Simplified Mixture of Experts in Transformers: Efficient Dynamic Computation', 'Experiment': 'Modify the Block class to include multiple expert sub-layers within each transformer block. Implement a gating mechanism that selects one of these sub-layers for each input dynamically. Compare the performance, training speed, and generalization capabilities of the simplified MoE-GPT model with the baseline GPT model on the provided datasets.', 'Interestingness': 7, 'Feasibility': 6, 'Novelty': 6} Idea generation converged after 3 iterations.

Checking novelty of idea 0: adaptive_block_size Response Status Code: 200 Response Content: {"total": 6626, "offset": 0, "next": 10, "data": [{"paperId": "d4b99821ab8c1ee3271a72dc4163feb8d310c8a0", "title": "DBPS: Dynamic Block Size and Precision Scaling for Efficient DNN Training Supported by RISC-V ISA Extensions", "abstract": "Over the past decade, it has been found that deep neural networks (DNNs) perform better on visual perception and language understanding tasks as their size increases. However, this comes at the cost of high energy consumption and large memory requirement to tr Response Status Code: 200 Response Content: {"total": 7531, "offset": 0, "next": 10, "data": [{"paperId": "5b2c04e082a56c0eb70ed62bc36148919f665e1c", "title": "SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention", "abstract": "Large language models (LLMs) now support extremely long context windows, but the quadratic complexity of vanilla attention results in significantly long Time-to-First-Token (TTFT) latency. Existing approaches to address this complexity require additiona Response Status Code: 200 Response Content: {"total": 204, "offset": 0, "next": 10, "data": [{"paperId": "eb9f044682d43f072a15f21822570024b31a7590", "title": "Dynamic Context Adaptation and Information Flow Control in Transformers: Introducing the Evaluator Adjuster Unit and Gated Residual Connections", "abstract": "Transformers have revolutionized various domains of artificial intelligence due to their unique ability to model long-range dependencies in data. However, they lack in nuanced, context-dependent modulation of features and info Response Status Code: 200 Response Content: {"total": 350, "offset": 0, "next": 10, "data": [{"paperId": "76ad063a928deb97752de17256fd92b63515d4fc", "title": "Domain Adaptive and Generalizable Network Architectures and Training Strategies for Semantic Image Segmentation", "abstract": "Unsupervised domain adaptation (UDA) and domain generalization (DG) enable machine learning models trained on a source domain to perform well on unlabeled or even unseen target domains. As previous UDA&DG semantic segmentation methods are mostly based on out Response Status Code: 200 Response Content: {"total": 787, "offset": 0, "next": 10, "data": [{"paperId": "de94361c09fa37567acb7c6674f1094828c61f19", "title": "A sustainable Bitcoin blockchain network through introducing dynamic block size adjustment using predictive analytics", "abstract": null, "venue": "Future generations computer systems", "year": 2023, "citationCount": 3, "citationStyles": {"bibtex": "@Article{Monem2023ASB,\n author = {Maruf Monem and Md Tamjid Hossain and Md. Golam Rabiul Alam and M. S. Munir and Md. Mahbubur Rahman

conglu1997 commented 2 months ago

I am guessing you do not have a SS key or were timed out or lost connection? You are welcome to execute ideas without checking for novelty!

leeJing77 commented 2 months ago

So where should I comment out the code? I get a lot of errors when I comment it out