RayZhhh / funsearch

Implementation for "Mathematical discoveries from program search with large language models".
Apache License 2.0
15 stars 1 forks source link

Why does the score output of my program is "None" #6

Open LtColLiuPeiqiang opened 2 months ago

LtColLiuPeiqiang commented 2 months ago

I modified part of the code to have this program solve SAT questions, and here's my code:

specification = r'''
import numpy as np

@funsearch.run
def evaluate(instances: dict) -> float:
    print(instances)
    num_of_vars=instances["num_of_vars"]
    clauses=instances["clauses"]
    """以能满足几个子句为标准,评估solve_sat函数中算法的优劣"""
    solution=solve_sat(num_of_vars, clauses)
    score=0.0
    for i in clauses:
        clau=0
        for j in i:
            if solution[abs(j[1])]==j[0]:
                clau=1
        if clau==1:
            score=score+1
    return score

@funsearch.evolve
def solve_sat(num_of_vars: int , clauses: list)-> list:
    """返回一个长度为num_of_vars,元素为1或0的序列,使得其尽量满足clauses中的3-SAT(布尔可满足性)问题

    Args:
        num_of_vars: SAT问题的变元个数
        clauses: 待求解的SAT问题,序列中的每个子序列都是一个子句,子句中的元素为一个文字,表示为含有两个元素的元组,第一个值代表该文字取变元原值还是取非,第二个值代表变元编号(0~num_of_vars-1)

    Return:
        长度等同于num_of_vars,由True或False组成的序列,代表该算法求解出的对应的变元取值
    """
    solution=[]
    for i in range(num_of_vars):
        solution.append(True)
    return solution
'''
# import bin_packing_utils

# bin_packing_or3 = {'OR3': bin_packing_utils.datasets['OR3']}
import numpy as np
sat_data={'train':{}}

def generate_3sat_test_data(num_clauses, num_vars):
    test_data = []

    for _ in range(num_clauses):
        clause = []
        for _ in range(3):
            negation = np.random.choice([True, False])
            var = np.random.randint(num_vars)
            literal = (negation, var)
            clause.append(literal)
        test_data.append(clause)

    return test_data

# Generate 3-SAT test data with 5 clauses and 3 variables
num_clauses = 100
num_vars = 10

for i in range(100):
    sat_data['train']['c100v10_'+str(i)]={'num_of_vars' : num_vars,
                                         'clauses' : generate_3sat_test_data(num_clauses, num_vars)}
from implementation import funsearch
from implementation import config

# It should be noted that the if __name__ == '__main__' is required.
# Because the inner code uses multiprocess evaluation.
if __name__ == '__main__':
    class_config = config.ClassConfig(llm_class=LLMAPI, sandbox_class=Sandbox)
    config = config.Config(samples_per_prompt=4, evaluate_timeout_seconds=30)
    global_max_sample_num = 10  # if it is set to None, funsearch will execute an endless loop
    funsearch.main(
        specification=specification,
        inputs=sat_data,
        config=config,
        max_sample_nums=global_max_sample_num,
        class_config=class_config,
        log_dir='../logs/funsearch_llm_api'
    )

and here is the output:

/usr/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
  self.pid = os.fork()
================= Evaluated Function =================
def solve_sat(num_of_vars: int, clauses: list) -> list:
    """返回一个长度为num_of_vars,元素为1或0的序列,使得其尽量满足clauses中的3-SAT(布尔可满足性)问题

    Args:
        num_of_vars: SAT问题的变元个数
        clauses: 待求解的SAT问题,序列中的每个子序列都是一个子句,子句中的元素为一个文字,表示为含有两个元素的元组,第一个值代表该文字取变元原值还是取非,第二个值代表变元编号(0~num_of_vars-1)

    Return:
        长度等同于num_of_vars,由True或False组成的序列,代表该算法求解出的对应的变元取值
    """
    solution=[]
    for i in range(num_of_vars):
        solution.append(True)
    return solution
------------------------------------------------------
Score        : None
Sample time  : None
Evaluate time: 30.075589179992676
Sample orders: None
======================================================

Note that my Score here is None

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-522a80989a5f> in <cell line: 6>()
      9     global_max_sample_num = 10  # if it is set to None, funsearch will execute an endless loop
     10     print(sat_data)
---> 11     funsearch.main(
     12         specification=specification,
     13         inputs=sat_data,

7 frames
/usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
     86                 return reduction(axis=axis, out=out, **passkwargs)
     87 
---> 88     return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
     89 
     90 

ValueError: zero-size array to reduction operation maximum which has no identity

Also, I wonder how Socre calculates, and why my program with modified specification can't output a Score correctly.

RayZhhh commented 2 months ago

您好!整体上我判断是这样:您设置的初始evolve函数评估失败,这样的话会导致得分为“None”。这种情况在funsearch的pipeline中是不被允许的。这也是下面这个报错的原因:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-522a80989a5f> in <cell line: 6>()
      9     global_max_sample_num = 10  # if it is set to None, funsearch will execute an endless loop
     10     print(sat_data)
---> 11     funsearch.main(
     12         specification=specification,
     13         inputs=sat_data,

7 frames
/usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
     86                 return reduction(axis=axis, out=out, **passkwargs)
     87 
---> 88     return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
     89 
     90 

ValueError: zero-size array to reduction operation maximum which has no identity

另外,我注意到了这个输出:

/usr/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
  self.pid = os.fork()
================= Evaluated Function =================
def solve_sat(num_of_vars: int, clauses: list) -> list:
    """返回一个长度为num_of_vars,元素为1或0的序列,使得其尽量满足clauses中的3-SAT(布尔可满足性)问题

    Args:
        num_of_vars: SAT问题的变元个数
        clauses: 待求解的SAT问题,序列中的每个子序列都是一个子句,子句中的元素为一个文字,表示为含有两个元素的元组,第一个值代表该文字取变元原值还是取非,第二个值代表变元编号(0~num_of_vars-1)

    Return:
        长度等同于num_of_vars,由True或False组成的序列,代表该算法求解出的对应的变元取值
    """
    solution=[]
    for i in range(num_of_vars):
        solution.append(True)
    return solution
------------------------------------------------------
Score        : None
Sample time  : None
Evaluate time: 30.075589179992676
Sample orders: None
======================================================

我发现:您的最大评估时间设置的是30s,但是当前函数的评估时间为30s,说明当前函数评估失败的原因是:超过了最大评估时间。我建议你可以尝试在下面这段代码增加configevaluate_timeout_seconds这个参数试一下:

if __name__ == '__main__':
    class_config = config.ClassConfig(llm_class=LLMAPI, sandbox_class=Sandbox)
    config = config.Config(samples_per_prompt=4, evaluate_timeout_seconds=30)
    global_max_sample_num = 10  # if it is set to None, funsearch will execute an endless loop
    funsearch.main(
        specification=specification,
        inputs=sat_data,
        config=config,
        max_sample_nums=global_max_sample_num,
        class_config=class_config,
        log_dir='../logs/funsearch_llm_api'
    )
LtColLiuPeiqiang commented 2 months ago

感谢您的回答,我在修改evaluate_timeout_seconds,并减小了数据规模后仍出现了同样的问题

/usr/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
  self.pid = os.fork()
================= Evaluated Function =================
def solve_sat(num_of_vars: int, clauses: list) -> list:
    """返回一个长度为num_of_vars,元素为1或0的序列,使得其尽量满足clauses中的3-SAT(布尔可满足性)问题

    Args:
        num_of_vars: SAT问题的变元个数
        clauses: 待求解的SAT问题,序列中的每个子序列都是一个子句,子句中的元素为一个文字,表示为含有两个元素的元组,第一个值代表该文字取变元原值还是取非,第二个值代表变元编号(0~num_of_vars-1)

    Return:
        长度等同于num_of_vars,由True或False组成的序列,代表该算法求解出的对应的变元取值
    """
    solution=[]
    for i in range(num_of_vars):
        solution.append(True)
    return solution
------------------------------------------------------
Score        : None
Sample time  : None
Evaluate time: 100.13026237487793
Sample orders: None
======================================================

我目前正在排查是否是我的specification中的代码出了问题

RayZhhh commented 2 months ago

可以看到当前这个代码的评估时间是100s,如果您设置的最大评估时间是100s,说明当前函数评估仍然超时了。我觉得可以尝试:

  1. 对specification的评估代码进行调试,试一下evolve函数能否正常评估,并输出打印结果。
  2. 如果1正常,尝试将evolve函数添加@numba.jit()并重新进行评估。 如果2失败,那么请将SandBox的numba加速设置为False.
LtColLiuPeiqiang commented 2 months ago

好的,不过我有一点比较好奇,在程序正常运行的过程中,LLM也可能返回无法在规定时间内结束的代码,针对这个问题没有异常处理吗

RayZhhh commented 2 months ago

会有异常处理,如果评估超时,这个函数的得分/fitness被设置为None。但是,这个pipeline中specification中的初始evolve函数必须能够被成功评估才行,后面的函数允许评估失败。