crawlab-team / crawlab

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
https://www.crawlab.cn
BSD 3-Clause "New" or "Revised" License
11.37k stars 1.79k forks source link

docker部署,实例重启后爬虫文件消失 #1251

Closed minjorx closed 1 year ago

minjorx commented 1 year ago

Bug 描述 描述:即使挂载数据卷到/data,docker 实例重启或升级后,之前上传的爬虫文件会消失,以及无法上传文件,前端接口报错为: {"status":"ok","message":"error","data":null,"error":"Post \"http://localhost:8888/fs/63b3c6f00510e5745ff82250/index.js\": dial tcp 127.0.0.1:8888: connect: connection refused"}, 日志报错为: [GIN] 2023/01/11 - 10:13:34 | 500 | 1.507170904s | 117.187.197.13 | POST "/spiders/63b3c6f00510e5745ff82250/files/save" Get "http://localhost:8888/fs/63b3c6f00510e5745ff82250/": dial tcp 127.0.0.1:8888: connect: connection refused /go/pkg/mod/github.com/crawlab-team/go-trace@v0.1.1/trace.go:6 github.com/crawlab-team/go-trace.PrintError() /go/pkg/mod/github.com/crawlab-team/crawlab-core@v0.6.1-0.20221221050531-dd102678c9cc/controllers/utils_http.go:13 github.com/crawlab-team/crawlab-core/controllers.handleError() /go/pkg/mod/github.com/crawlab-team/crawlab-core@v0.6.1-0.20221221050531-dd102678c9cc/controllers/utils_http.go:23 github.com/crawlab-team/crawlab-core/controllers.HandleError() /go/pkg/mod/github.com/crawlab-team/crawlab-core@v0.6.1-0.20221221050531-dd102678c9cc/controllers/utils_http.go:47 github.com/crawlab-team/crawlab-core/controllers.HandleErrorInternalServerError() /go/pkg/mod/github.com/crawlab-team/crawlab-core@v0.6.1-0.20221221050531-dd102678c9cc/controllers/filer.go:91 github.com/crawlab-team/crawlab-core/controllers.(filerContext).do() /go/pkg/mod/github.com/gin-gonic/gin@v1.8.1/context.go:173 github.com/gin-gonic/gin.(Context).Next() /go/pkg/mod/github.com/crawlab-team/crawlab-core@v0.6.1-0.20221221050531-dd102678c9cc/middlewares/filer_auth.go:32 github.com/crawlab-team/crawlab-core/middlewares.FilerAuthorizationMiddleware.func1() /go/pkg/mod/github.com/gin-gonic/gin@v1.8.1/context.go:173 github.com/gin-gonic/gin.(Context).Next() /go/pkg/mod/github.com/crawlab-team/crawlab-core@v0.6.1-0.20221221050531-dd102678c9cc/middlewares/cors.go:17 github.com/crawlab-team/crawlab-core/middlewares.CORSMiddleware.func1() /go/pkg/mod/github.com/gin-gonic/gin@v1.8.1/context.go:173 github.com/gin-gonic/gin.(Context).Next() /go/pkg/mod/github.com/gin-gonic/gin@v1.8.1/recovery.go:101 github.com/gin-gonic/gin.CustomRecoveryWithWriter.func1() /go/pkg/mod/github.com/gin-gonic/gin@v1.8.1/context.go:173 github.com/gin-gonic/gin.(Context).Next() /go/pkg/mod/github.com/gin-gonic/gin@v1.8.1/logger.go:240 github.com/gin-gonic/gin.LoggerWithConfig.func1() /go/pkg/mod/github.com/gin-gonic/gin@v1.8.1/context.go:173 github.com/gin-gonic/gin.(Context).Next() /go/pkg/mod/github.com/gin-gonic/gin@v1.8.1/gin.go:616 github.com/gin-gonic/gin.(Engine).handleHTTPRequest() /go/pkg/mod/github.com/gin-gonic/gin@v1.8.1/gin.go:572 github.com/gin-gonic/gin.(Engine).ServeHTTP() /usr/local/go/src/net/http/server.go:2868 net/http.serverHandler.ServeHTTP() /usr/local/go/src/net/http/server.go:1933 net/http.(*conn).serve() /usr/local/go/src/runtime/asm_amd64.s:1371 runtime.goexit() 环境:华为云CCE或者本地windows docker desktop 部署方式:单主节点 部署命令:docker run -d --name="crawlab_master" -p 8080:8080 -e CRAWLAB_NODE_MASTER="Y" -v //c/Users/it/Desktop/dockerdata/crawlab:/data -e CRAWLAB_MONGO_HOST="***" -e CRAWLAB_MONGO_PORT="27017" crawlabteam/crawlab

复现步骤 该 Bug 复现步骤如下

  1. windows docker desktop, 以单主节点上述部署命令进行部署
  2. 创建爬虫并上传脚本文件
  3. 删除部署实例,并再次以同样的配置和部署命令进行部署

期望结果 以docker形式部署,实例重启或升级后爬虫文件依旧存在。

截屏 image image

tikazyq commented 1 year ago

大概率是你没有持久化数据。如果不持久化,会导致重启后容器已保存数据丢失。

docker-compose.yml 中,在节点服务下,添加以下内容。

...
    volumes:
      - "/opt/crawlab/master:/data"  # 持久化 crawlab 数据
      - "/opt/.crawlab/master:/root/.crawlab"  # 持久化节点元数据
...
minjorx commented 1 year ago

已解决问题,部署命令:docker run -d --name="crawlab_master" -p 8080:8080 -e CRAWLAB_NODE_MASTER="Y" -v //c/Users/it/Desktop/dockerdata/crawlab:/data -e CRAWLAB_MONGO_HOST="" -e CRAWLAB_MONGO_PORT="27017" crawlabteam/crawlab 实际是挂载了的,原因是实例之后的前几分钟,即使能够访问到爬虫web页面,fs系统也还没有准备好,就会出现该问题,等待几分钟即可。

zzb980881 commented 1 year ago

官方文档的文件持久化目录有问题,/opt/crawlab/master这个目录下没有文件。 上传的文件放在/data目录下,需要挂载这个目录才行。

D0ggy commented 1 year ago

官方文档的文件持久化目录有问题,/opt/crawlab/master这个目录下没有文件。 上传的文件放在/data目录下,需要挂载这个目录才行。

@zzb980881

参考文档

你可能记错了 docker-compose.yaml 配置中 volume 的含义,你提到的路径 /opt/crawlab/master 指的是主机的路径。