Closed fcityyyyy closed 9 months ago
好的,我先去看看,非常感谢答复
根据说明,修改了权限,现在运行npm run start_direct,主程序可以跑起来了,
也能浏览任务,
但点击设计任务后,会报以下错误:
GET A MESSAGE: { type: 0, message: { id: 1 } }
set socket_start
(node:18384) UnhandledPromiseRejectionWarning: Error: spawn /mysofts/crawler/EasySpider-0.3.5-c/ElectronJS/chrome_linux64/chromedriver_linux64 EACCES
at /mysofts/crawler/EasySpider-0.3.5-c/ElectronJS/node_modules/selenium-webdriver/remote/index.js:260:24
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
(Use electron --trace-warnings ...
to show where the warning was created)
(node:18384) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag --unhandled-rejections=strict
(see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
(node:18384) PromiseRejectionHandledWarning: Promise rejection was handled asynchronously (rejection id: 1)
(node:18384) UnhandledPromiseRejectionWarning: Error: spawn /mysofts/crawler/EasySpider-0.3.5-c/ElectronJS/chrome_linux64/chromedriver_linux64 EACCES
at /mysofts/crawler/EasySpider-0.3.5-c/ElectronJS/node_modules/selenium-webdriver/remote/index.js:260:24
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
(node:18384) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag --unhandled-rejections=strict
(see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 3)
(node:18384) UnhandledPromiseRejectionWarning: Error: spawn /mysofts/crawler/EasySpider-0.3.5-c/ElectronJS/chrome_linux64/chromedriver_linux64 EACCES
at /mysofts/crawler/EasySpider-0.3.5-c/ElectronJS/node_modules/selenium-webdriver/remote/index.js:260:24
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
(node:18384) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag --unhandled-rejections=strict
(see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 4)
GET A MESSAGE: { type: 0, message: { id: 2 } } set socket_flowchart
还请再帮忙看看是哪里出了问题?
单独运行chrome浏览器是可以的,
另外,运行npm run start_direct,主程序起来后,后台有如下报错,不知道有没有影响 [user1@cent11 ElectronJS]$ npm run start_direct
easy-spider@0.3.5 start_direct electron .
Server has started. server_address: http://localhost:8074 x64 /mysofts/crawler/EasySpider-0.3.5-c/ElectronJS/chrome_linux64/chromedriver_linux64 /mysofts/crawler/EasySpider-0.3.5-c/Elec tronJS/chrome_linux64/chrome /mysofts/crawler/EasySpider-0.3.5-c/ElectronJS/chrome_linux64/execute.sh linux A JavaScript error occurred in the main process Uncaught Exception: Error: EACCES: permission denied, open 'info.log' [18384:1121/111727.823623:ERROR:bus.cc(399)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix") [18384:1121/111727.823658:ERROR:bus.cc(399)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix") [18384:1121/111727.846200:ERROR:bus.cc(399)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix") [18384:1121/111727.912438:ERROR:bus.cc(399)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")
以上非常非常感谢
遇到的错误信息 UnhandledPromiseRejectionWarning: Error: spawn [...] EACCES 通常说明了以下两个主要问题:
权限问题:EACCES(Error Access)表明你执行 chromedriver_linux64 二进制文件时没有设置必要的执行权限,或者运行 Electron 应用程序的用户没有必要的权限。
未处理的承诺拒绝:意味着你的代码中存在一个被拒绝的承诺,且该拒绝没有被适当地通过 .catch 处理程序捕获,或者在 async 函数中没有被 try/catch 块捕获。
解决这些问题,可以按照以下步骤操作:
解决 EACCES 错误 确保执行权限: 确保 chromedriver_linux64 文件具有执行权限。你可以通过在终端中运行以下命令来设置它:
bash chmod +x /mysofts/crawler/EasySpider-0.3.5-c/ElectronJS/chrome_linux64/chromedriver_linux64
检查所有者权限: 验证当前用户是否具有访问该文件的权限。如果不是,请使用 chown 或者 sudo 命令改变所有者或者允许当前用户访问该文件。
解决未处理的承诺拒绝问题 检查代码中所有的 promise: 查找代码中可能产生 UnhandledPromiseRejectionWarning 警告的 promise。对于每个 promise 或异步操作,请确保你有适当的错误处理机制,比如 .catch 块或者包含在 try/catch 结构中。
someAsyncFunction()
.then((result) => {
// 处理结果
})
.catch((error) => {
// 错误处理
console.error(error);
});
或者在 async 函数中:
async function asyncCall() {
try {
let result = await someAsyncFunction();
// 处理结果
} catch (error) {
// 错误处理
console.error(error);
}
}
确保在应用程序中每个异步任务都被适当地管理和捕获错误,这样可以防止它们造成未处理的承诺拒绝警告。
好的,非常非常感谢,我再对照看看
按照回复修改了chromedriver_linux64的权限,加上执行权限就好了,主程序可以跑起来了,点设计新任务也能够设计了 非常感谢,
按照编译说明,开始进行执行阶段程序的编译, 执行了 pip3 install -r requirements.txt,提示都成功, 第一次执行python3 easyspider_executestage.py --id [1],提示lxml模块没找到 pip3 list看了一下我这个环境确实没有安装上, 又pip3 install lxml安装了一下,pip3 list 也能看到这个库了, 再次执行python3 easyspider_executestage.py --id [1], 提示以下信息:
[user1@cent11 ExecuteStage]$ python3 easyspider_executestage.py --id [1]
Configurations: +------------------+------+-----------------------+ | Key | Type | Value | +------------------+------+-----------------------+ | id | list | [1] | | saved_file_name | str | | | user_data | bool | False | | config_folder | str | | | config_file_name | str | config.json | | read_type | str | remote | | headless | bool | False | | server_address | str | http://localhost:8074 | | version | str | 0.3.5 | +------------------+------+-----------------------+
linux ('64bit', 'ELF') Finding chromedriver in EasySpider /mysofts/crawler/EasySpider-0.3.5-c/ExecuteStage/ElectronJS
Absolute_user_data_folder: D:\Documents\Projects\EasySpider\ElectronJS\user_data
<selenium.webdriver.chrome.options.Options object at 0x7fb099ac03a0> id: 1 Save Name for task ID 1 is: 2023_11_22_20_57_20_236066 任务ID 1 的保存文件名为: 2023_11_22_20_57_20_236066 remote
Cannot automatically check new version, please use the following command to check whether a new version avaliable and upgrade by pip: pip index versions commandline_config pip install commandline --upgrade Task Name: 中国知网 任务名称: 中国知网 Traceback (most recent call last): File "/usr/local/python3/lib/python3.8/site-packages/selenium/webdriver/common/service.py", line 71, in start self.process = subprocess.Popen(cmd, env=self.env, File "/usr/local/python3/lib/python3.8/subprocess.py", line 858, in init self._execute_child(args, executable, preexec_fn, close_fds, File "/usr/local/python3/lib/python3.8/subprocess.py", line 1704, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: '../ElectronJS/chrome_win64/chromedriver_win64.exe'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "easyspider_executestage.py", line 1395, in
我看出错提示好像是说FileNotFoundError: [Errno 2] No such file or directory: '../ElectronJS/chrome_win64/chromedriver_win64.exe',没有找到chromedriver_win64.exe这个文件,我这个是linux环境,应该是chromedriver_linux64这个文件才对啊。
是我哪里执行错了吗?
还请再帮忙看看,非常非常感谢
直接修改代码中'../ElectronJS/chrome_win64/chromedriver_win64.exe'那行的路径为你Linux的chromedriver路径即可。
好的,非常非常感谢,我再对照看看
依据您的回复,我把easyspider_executestage.py中的chrome和chromedriver名称和路径修改了,
现在运行python3 easyspider_executestage.py --id [1] 能够出来这样一个浏览器窗口
不过后台还是报错有文件找不到, [user1@cent11 ExecuteStage]$ python3 easyspider_executestage.py --id [1]
Configurations: +------------------+------+-----------------------+ | Key | Type | Value | +------------------+------+-----------------------+ | id | list | [1] | | saved_file_name | str | | | user_data | bool | False | | config_folder | str | | | config_file_name | str | config.json | | read_type | str | remote | | headless | bool | False | | server_address | str | http://localhost:8074 | | version | str | 0.3.5 | +------------------+------+-----------------------+
Cannot automatically check new version, please use the following command to check whether a new version avaliable and upgrade by pip: pip index versions commandline_config pip install commandline --upgrade linux ('64bit', 'ELF') Finding chromedriver in EasySpider /mysofts/crawler/EasySpider-0.3.5-c/ExecuteStage/ElectronJS
Absolute_user_data_folder: D:\Documents\Projects\EasySpider\ElectronJS\user_data
<selenium.webdriver.chrome.options.Options object at 0x7f36023b13a0>
id: 1
Save Name for task ID 1 is: 2023_11_23_08_25_29_830135
任务ID 1 的保存文件名为: 2023_11_23_08_25_29_830135
remote
Task Name: 中国知网
任务名称: 中国知网
Traceback (most recent call last):
File "easyspider_executestage.py", line 1404, in
我查了一下,这个目录确实没有这个js文件,但不知道从哪里去找, 麻烦再帮忙看看,非常非常感谢。
另外我以为是不是直接打包到主程序能够绕过这个问题,按照编译说明 执行generateExecutable_Linux64.sh,报如下错误:
[user1@cent11 ExecuteStage]$ ./generateExecutable_Linux64.sh rm: 无法删除"build": 没有那个文件或目录 rm: 无法删除"dist": 没有那个文件或目录 ./generateExecutable_Linux64.sh:行3: pyinstaller: 未找到命令 rm: 无法删除"../ElectronJS/chrome_linux64/easyspider_executestage": 没有那个文件或目录 cp: 无法获取"dist/easyspider_executestage" 的文件状态(stat): 没有那个文件或目录
这块也麻烦帮忙看看,非常非常感谢。
ElectronJS文件夹下有这个文件,拷贝到指定目录即可。
下面的打包脚本是Ubuntu的,不能混用。
好的,我拷贝下看看,
另外打包脚本如果是Ubuntu下用的话,CentOS下问下要如何修改吗? 我看generateExecutable_Linux64.sh打包脚本是这样的: rm -r build rm -r dist pyinstaller -F --icon=favicon.ico easyspider_executestage.py rm ../ElectronJS/chrome_linux64/easyspider_executestage cp dist/easyspider_executestage ../ElectronJS/chrome_linux64/easyspider_executestage
这几行除了第三行,都是删除和拷贝文件的命令,不知道从何改起? 还请再帮忙指导下,非常非常感谢。
不需要打包,能运行起来就行,一定要打包这个脚本可以不用改。
拷贝了stealth.min.js到chrome_linux64后,能够正常设计任务和保存任务了,
不过当点击调用任务的时候,
会报zha找不到execute.sh的错误,
我按照之前的说明,在ElectronJS目录下也没有找到这个文件,只找到execute_macos.sh 和execute.bat文件,
我试着修改execute_macos.sh这个文件,
echo "Executing EasySpider on MacOS"
./easyspider_executestage $1 $2 $3 $4 $5 $6 $7 $8 $9
但发现easyspider_executestage 这个文件也没有,按照编译说明,这似乎是执行阶段编译打包后产生的文件,
试着执行打包命令,
[user1@cent11 ExecuteStage]$ ./generateExecutable_Linux64.sh rm: 无法删除"build": 没有那个文件或目录 rm: 无法删除"dist": 没有那个文件或目录 ./generateExecutable_Linux64.sh:行3: pyinstaller: 未找到命令 rm: 无法删除"../ElectronJS/chrome_linux64/easyspider_executestage": 没有那个文件或目录 cp: 无法获取"dist/easyspider_executestage" 的文件状态(stat): 没有那个文件或目录
仍然还是报以上错误,并且我实际上也是想打包部署到服务器上使用的,
以上还请再帮忙看看我的问题出在了哪儿?非常非常感谢!
好的,我试试,非常非常感谢
按照推荐的方法搜索拷贝两个文件到相应目录,不行,于是查看了execute.sh,发现执行文件的路径不对, 将内容修改为:
./easyspider_executestage $1 $2 $3 $4 $5 $6 $7 $8 $9 调用任务还是不行,主程序没有反应,浏览器界面不出来,也没有数据记录,
于是想是不是还是得CentOS环境打包编译执行阶段的程序,重新去执行编译generateExecutable_Linux64.sh,这个脚本去排查问题,发现是pyinstaller找不到,在脚本中指定pyintaller的绝对路径,又解决了提示python3 enable--share参数问题后,打包成功了,
dist目录下的easyspider_executestage也自动拷贝到chrome_linux下。 于是重新执行任务,还是不行,重新设计了个任务来执行,也还是不行。
试着在ExecuteStage目录下执行python3 easyspider_executestage.py --id [2],也修改了config.json下的数据文件位置,也还是不行,提示如下,目录下也没有生成的数据文件。 [user1@cent11 ExecuteStage]$ python3 easyspider_executestage.py --id [2]
Configurations: +------------------+------+-----------------------+ | Key | Type | Value | +------------------+------+-----------------------+ | id | list | [2] | | saved_file_name | str | | | user_data | bool | False | | config_folder | str | | | config_file_name | str | config.json | | read_type | str | remote | | headless | bool | False | | server_address | str | http://localhost:8074 | | version | str | 0.3.5 | +------------------+------+-----------------------+
linux ('64bit', 'ELF') Finding chromedriver in EasySpider /mysofts/crawler/EasySpider-0.3.5-c/ExecuteStage/ElectronJS
Absolute_user_data_folder: /home/user1/crawler_data
<selenium.webdriver.chrome.options.Options object at 0x7f072c6863a0> id: 2 Save Name for task ID 2 is: 2023_11_28_12_19_34_045771 任务ID 2 的保存文件名为: 2023_11_28_12_19_34_045771 remote
Cannot automatically check new version, please use the following command to check whether a new version avaliable and upgrade by pip:
pip index versions commandline_config
pip install commandline --upgrade
Traceback (most recent call last):
File "easyspider_executestage.py", line 1362, in
目前不知道从哪方面着手解决问题了,还请再帮忙看看,非常非常感谢。。
好的,我看看对照下
确实是我把执行任务的ID搞错了,我execution_instances下只有0.json和1.json。
python3 easyspider_executestage.py --id [0] 传值正确后就好了,能够抓到相关的数据,控制台也能看得到。
通过命令行./chrome_linux64/easyspider_executestage --id '[0]' --user_data 0 --server_address http://localhost:8074 --config_folder "/mysofts/crawler/EasySpider-0.3.5-c/ElectronJS/" --headless 0 --read_type remote --config_file_name config.json --saved_file_name 也能够抓到相关数据。
很是开心,非常非常感谢您的指导和帮助
现在就是只有在任务页面下点击【本地直接执行】不行,没有反应,后台也看不到报错,就只是正常的提示信息:
GET A MESSAGE: { type: 5, message: { id: 2, user_data_folder: '', execute_type: 0 } } { id: 2, user_data_folder: '', execute_type: 0 }
GET A MESSAGE: { type: 5, message: { id: 2, user_data_folder: '', execute_type: 0 } } { id: 2, user_data_folder: '', execute_type: 0 } 0.json 1.json 2.json
GET A MESSAGE: { type: 5, message: { id: 3, user_data_folder: '', execute_type: 1 } } { id: 3, user_data_folder: '', execute_type: 1 }
data目录下也看不到数据。
这个是和我用x11 forward的方式来打开的有关系吗?设计任务的时候可以正常设计和保存,不知道运行的时候为什么不行? 还请帮助再看看,非常非常感谢!
本地直接执行需要依赖目录下的chrome_linux64/execute.sh
文件,和设计任务的流程无关,其核心仍然是命令行调用脚本,CentOS下我也没有测试过,核心代码在ElectronJS文件夹下的main.js
的76-78行以及341-347行,你可以自行调试下,如果调试不成功那就用命令行执行吧:
driverPath = path.join(__dirname, "chrome_linux64/chromedriver_linux64");
chromeBinaryPath = path.join(__dirname, "chrome_linux64/chrome");
execute_path = path.join(__dirname, "chrome_linux64/execute.sh");
let spawn = require("child_process").spawn;
if (process.platform != "darwin" && msg.message.execute_type == 1 && msg.message.id != -1) {
let child_process = spawn(execute_path, parameters);
child_process.stdout.on('data', function (data) {
console.log(data.toString());
});
}
好的,明白了,我再试试看,非常非常感谢
按照源码中的编译说明,先编译的主程序ElectronJS, CentOS上下载安装了最新的chrome ,命令google-chrome-stable -version,显示Google Chrome 119.0.6045.159
也按照说明将/opt/google/chrome/,全部copy到了ElectronJS下,并重命名为chrome_linux64。
也下载了对应版本的chromedriver_linux64,放到了chrome_linux64下
npm install和npm install @electron-forge/cli -g 两个命令也都执行安装成功了(换了taobao源,npm安装过程中提示需要python3,也安装了python3.8.15,安装后命令执行成功)
但最后执行npm run start_direct,总是报错,
用root用户执行会报:
[1120/000559.944607:FATAL:electron_main_delegate.cc(294)] Running as root without --no-sandbox is not supported. See https://crbug.com/638180. /mysofts/crawler/EasySpider-0.3.5-c/ElectronJS/node_modules/electron/dist/electron exited with signal SIGTRAP
切换普通用户后执行报错:
[13824:1120/000541.120354:FATAL:setuid_sandbox_host.cc(158)] The SUID sandbox helper binary was found, but is not configured correctly. Rather than run without sandboxing I'm aborting now. You need to make sure that /mysofts/crawler/EasySpider-0.3.5-c/ElectronJS/node_modules/electron/dist/chrome-sandbox is owned by root and has mode 4755. /mysofts/crawler/EasySpider-0.3.5-c/ElectronJS/node_modules/electron/dist/electron exited with signal SIGTRAP
麻烦帮忙看看,是哪里出了问题?万分感谢!!!