HenryQW / Awesome-TTRSS

🐋 Awesome TTRSS, a powerful Dockerised all-in-one RSS solution.
http://ttrss.henry.wang
MIT License
2.43k stars 496 forks source link

添加 postgresql 中文全文搜索 zhparser/jieba/pgroonga #59

Open davidlauhn opened 5 years ago

davidlauhn commented 5 years ago

postgresql自带的搜索不支持中文,导致ttrss搜索中文的根本没法用,不知道有没有计划添加 zhparser/jieba/pgrooga之类的?

HenryQW commented 5 years ago

没有这个计划,看起来需要更改 TTRSS 的搜索逻辑,mysql 有这个问题吗?

HenryQW commented 5 years ago

推荐通过阅读器来实现全文搜索,比如 Reeder

davidlauhn commented 5 years ago

mysql没有试过哦,我自己慢慢试试看,谢谢

jostyee commented 5 years ago

https://discourse.tt-rss.org/t/solved-search-in-chinese/2241/2 如果没有理解错其实tinytinyrss已经支持,只要配置好pgrooga就能设置全局搜索的默认语言了?

davidlauhn commented 5 years ago

https://discourse.tt-rss.org/t/solved-search-in-chinese/2241/2 如果没有理解错其实tinytinyrss已经支持,只要配置好pgrooga就能设置全局搜索的默认语言了?

太菜,搞不懂pgrooga怎么配置,然后用zhparser实现了

HenryQW commented 5 years ago

@davidlauhn 可以分享一下解决方案,我看看能不能加进去。或者直接 PR 就完美了!

HenryQW commented 5 years ago

@jostyee 没看懂 DEFAULT_SEARCH_LANGUAGE 的用法,我试了下那个贴里的办法还是不行。

davidlauhn commented 5 years ago

@HenryQW 本人非码农/非运维,以下全部基于copy/paste,只知然,不知所以然,而且不一定准确,没法接受提问,因为真的不懂,抱歉 :-)

修改了两个 docker image

docker-compose.yml

services:
  database.postgres:
    image: davidlauhn/postgres-11-with-zhparser:latest
    container_name: postgres
    environment:
      - PG_PASSWORD=password # please change the password
      - DB_EXTENSION=pg_trgm
    volumes:
      - ~/postgres/data/:/var/lib/postgresql/ # persist postgres data to ~/postgres/data/ on the host
    restart: always

  service.rss:
    image: davidlauhn/awesome-ttrss:latest
    container_name: ttrss
    ports:
      - 80:80
    environment:
      - SELF_URL_PATH=http://domain.name/ # please change to your own domain
      - DB_HOST=database.postgres
      - DB_PORT=5432
      - DB_NAME=ttrss
      - DB_USER=postgres
      - DB_PASS=password # please change the password
      - ENABLE_PLUGINS=auth_internal, fever # auth_internal is required. Plugins enabled here will be enabled for all users as system plugins
    stdin_open: true
    tty: true
    restart: always
    command: sh -c 'sh /wait-for.sh database.postgres:5432 -- php /configure-db.php && exec s6-svscan /etc/s6/'

  service.mercury: # set Mercury Parser API endpoint to `service.mercury:3000` on TTRSS plugin setting page
    image: wangqiru/mercury-parser-api:latest
    container_name: mercury
    expose:
      - 3000
    restart: always

    service.opencc: # set OpenCC API endpoint to `service.opencc:3000` on TTRSS plugin setting page
    image: wangqiru/opencc-api-server:latest
    container_name: opencc
    environment:
      NODE_ENV: production
    expose:
      - 3000
    restart: always

然后配置一下zhparser

    docker exec -it postgres /bin/sh
    psql -U postgres -d ttrss -c 'CREATE EXTENSION zhparser'
    psql ttrss postgres -c 'CREATE TEXT SEARCH CONFIGURATION Chinese (PARSER = zhparser)'
    psql ttrss postgres -c 'ALTER TEXT SEARCH CONFIGURATION Chinese ADD MAPPING FOR n,v,a,i,e,l WITH simple'
    psql ttrss postgres
    update ttrss_entries set tsvector_combined = to_tsvector('Chinese', content);

重启一下postgresql,更改ttrss的搜索语言为Chinese即可。

中文搜索堪用,但貌似分词稍微有点点小问题,zhparser会把长词拆成短词匹配,应该是zhparser默认的配置还需要调教,因我要求也不高,所以将就着用了

jostyee commented 5 years ago

@davidlauhn 启用zhparser没那么麻烦,sameersbn/postgresql 支持通过env开启的:

https://github.com/sameersbn/docker-postgresql#enabling-extensions

davidlauhn commented 5 years ago

@jostyee 我也不想这么大费周章,可不懂嘛,所以就跟着说明一步步走咯 :-)

HenryQW commented 5 years ago

@jostyee zhparser 还需要装依赖的,不能直接开启

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity in 14 days. It will be closed if no further activity occurs in 7 days. Thank you for your contributions.

HenryQW commented 4 years ago

有空调查一下可行性。欢迎大佬 PR!

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity in 14 days. It will be closed if no further activity occurs in 7 days. Thank you for your contributions.

hoilc commented 4 years ago

简单地改了版,有兴趣的可以试用下

postgres镜像 hoilc/postgres-chinese-textsearch:latest

ttrss镜像 hoilc/ttrss:latest, 需要添加环境变量TEXTSEARCH_EXTENSION=pg_jieba,zhparser

https://github.com/hoilc/Awesome-TTRSS/blob/master/docker-compose.yml

ptsa commented 4 years ago

简单地改了版,有兴趣的可以试用下

postgres镜像 hoilc/postgres-chinese-textsearch:latest

ttrss镜像 hoilc/ttrss:latest, 需要添加环境变量TEXTSEARCH_EXTENSION=pg_jieba,zhparser

https://github.com/hoilc/Awesome-TTRSS/blob/master/docker-compose.yml

@HenryQW 这个好用的话可以合并过来,ttrss的中文搜索的确不行

HenryQW commented 4 years ago

PR 一下嘛?我最近太忙了

ptsa commented 4 years ago

@hoilc 提交下pr @HenryQW 他这个postgresql 也有改 你要fork 下他的postgresql 吧 https://github.com/hoilc/postgres-chinese-textsearch

ptsa commented 4 years ago

@hoilc 没有提交pr 我复制了他的代码 提交了 pr

0rt commented 3 years ago

请问这个修改汇到latest没有?我尝试搜索中文还是没成功

ptsa commented 3 years ago

@0rt 我提交没成功。可能方法没对

appotry commented 1 year ago

调试了一个最新版的 postgres-chinese-textsearch

postgres-chinese-textsearch https://hub.docker.com/r/bloodstar/postgres-chinese-textsearch

version: "3"
services:
  service.rss:
    image: bloodstar/ttrss:latest
    container_name: ttrss
    ports:
      - 181:80
    environment:
      - SELF_URL_PATH=http://localhost:181/ # please change to your own domain
      - DB_HOST=database.postgres
      - DB_PORT=5432
      - DB_NAME=ttrss
      - DB_USER=postgres
      - DB_PASS=ttrss # please change the password
      - PUID=1000
      - PGID=1000
      - TEXTSEARCH_EXTENSION=pg_jieba # add support for chinese fulltext search (pg_jieba, zhparser, or both two)
    volumes:
      - feed-icons:/var/www/feed-icons/
    networks:
      - public_access
      - service_only
      - database_only
    stdin_open: true
    tty: true
    restart: always

  service.mercury: # set Mercury Parser API endpoint to `service.mercury:3000` on TTRSS plugin setting page
    image: wangqiru/mercury-parser-api:latest
    container_name: mercury
    networks:
      - public_access
      - service_only
    restart: always

  service.opencc: # set OpenCC API endpoint to `service.opencc:3000` on TTRSS plugin setting page
    image: wangqiru/opencc-api-server:latest
    container_name: opencc
    environment:
      - NODE_ENV=production
    networks:
      - service_only
    restart: always

  # database.postgres:
  #   image: postgres:13-alpine
  #   container_name: postgres
  #   environment:
  #     - POSTGRES_PASSWORD=ttrss # feel free to change the password
  #   volumes:
  #     - ~/postgres/data/:/var/lib/postgresql/data # persist postgres data to ~/postgres/data/ on the host
  #   networks:
  #     - database_only
  #   restart: always

  database.postgres:
    image: bloodstar/postgres-chinese-textsearch:latest
    container_name: postgres
    environment:
      - POSTGRES_PASSWORD=ttrss # please change the password
    volumes:
      - ~/postgres/data/:/var/lib/postgresql/data # persist postgres data to ~/postgres/data/ on the host
    restart: always

  # utility.watchtower:
  #   container_name: watchtower
  #   image: containrrr/watchtower:latest
  #   volumes:
  #     - /var/run/docker.sock:/var/run/docker.sock
  #   environment:
  #     - WATCHTOWER_CLEANUP=true
  #     - WATCHTOWER_POLL_INTERVAL=86400
  #   restart: always

volumes:
  feed-icons:

networks:
  public_access: # Provide the access for ttrss UI
  service_only: # Provide the communication network between services only
    internal: true
  database_only: # Provide the communication between ttrss and database only
    internal: true