Open LtColLiuPeiqiang opened 2 months ago
您好!整体上我判断是这样:您设置的初始evolve函数评估失败,这样的话会导致得分为“None”。这种情况在funsearch的pipeline中是不被允许的。这也是下面这个报错的原因:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-6-522a80989a5f> in <cell line: 6>()
9 global_max_sample_num = 10 # if it is set to None, funsearch will execute an endless loop
10 print(sat_data)
---> 11 funsearch.main(
12 specification=specification,
13 inputs=sat_data,
7 frames
/usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
86 return reduction(axis=axis, out=out, **passkwargs)
87
---> 88 return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
89
90
ValueError: zero-size array to reduction operation maximum which has no identity
另外,我注意到了这个输出:
/usr/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
self.pid = os.fork()
================= Evaluated Function =================
def solve_sat(num_of_vars: int, clauses: list) -> list:
"""返回一个长度为num_of_vars,元素为1或0的序列,使得其尽量满足clauses中的3-SAT(布尔可满足性)问题
Args:
num_of_vars: SAT问题的变元个数
clauses: 待求解的SAT问题,序列中的每个子序列都是一个子句,子句中的元素为一个文字,表示为含有两个元素的元组,第一个值代表该文字取变元原值还是取非,第二个值代表变元编号(0~num_of_vars-1)
Return:
长度等同于num_of_vars,由True或False组成的序列,代表该算法求解出的对应的变元取值
"""
solution=[]
for i in range(num_of_vars):
solution.append(True)
return solution
------------------------------------------------------
Score : None
Sample time : None
Evaluate time: 30.075589179992676
Sample orders: None
======================================================
我发现:您的最大评估时间设置的是30s,但是当前函数的评估时间为30s,说明当前函数评估失败的原因是:超过了最大评估时间。我建议你可以尝试在下面这段代码增加config
的evaluate_timeout_seconds
这个参数试一下:
if __name__ == '__main__':
class_config = config.ClassConfig(llm_class=LLMAPI, sandbox_class=Sandbox)
config = config.Config(samples_per_prompt=4, evaluate_timeout_seconds=30)
global_max_sample_num = 10 # if it is set to None, funsearch will execute an endless loop
funsearch.main(
specification=specification,
inputs=sat_data,
config=config,
max_sample_nums=global_max_sample_num,
class_config=class_config,
log_dir='../logs/funsearch_llm_api'
)
感谢您的回答,我在修改evaluate_timeout_seconds,并减小了数据规模后仍出现了同样的问题
/usr/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
self.pid = os.fork()
================= Evaluated Function =================
def solve_sat(num_of_vars: int, clauses: list) -> list:
"""返回一个长度为num_of_vars,元素为1或0的序列,使得其尽量满足clauses中的3-SAT(布尔可满足性)问题
Args:
num_of_vars: SAT问题的变元个数
clauses: 待求解的SAT问题,序列中的每个子序列都是一个子句,子句中的元素为一个文字,表示为含有两个元素的元组,第一个值代表该文字取变元原值还是取非,第二个值代表变元编号(0~num_of_vars-1)
Return:
长度等同于num_of_vars,由True或False组成的序列,代表该算法求解出的对应的变元取值
"""
solution=[]
for i in range(num_of_vars):
solution.append(True)
return solution
------------------------------------------------------
Score : None
Sample time : None
Evaluate time: 100.13026237487793
Sample orders: None
======================================================
我目前正在排查是否是我的specification中的代码出了问题
可以看到当前这个代码的评估时间是100s,如果您设置的最大评估时间是100s,说明当前函数评估仍然超时了。我觉得可以尝试:
好的,不过我有一点比较好奇,在程序正常运行的过程中,LLM也可能返回无法在规定时间内结束的代码,针对这个问题没有异常处理吗
会有异常处理,如果评估超时,这个函数的得分/fitness被设置为None。但是,这个pipeline中specification中的初始evolve函数必须能够被成功评估才行,后面的函数允许评估失败。
I modified part of the code to have this program solve SAT questions, and here's my code:
and here is the output:
Note that my Score here is None
Also, I wonder how Socre calculates, and why my program with modified specification can't output a Score correctly.