aploium / zmirror

The next-gen reverse proxy for full site mirroring
http://zmirror.org
MIT License
2.39k stars 729 forks source link

zmirror

zmirror version zmirror Build Status zmirror unittest coverage zmirror Dependency Status
zmirror PRs Welcome zmirror Gitter

an http reverse proxy designed to automatically and completely mirror a website (such as google), support cache and CDN
一个Python反向HTTP代理程序, 用于快速、简单地创建别的网站的镜像, 自带本地文件缓存、CDN支持
比如国内可以访问的Google镜像/中文维基镜像

自带了几个配置文件: 比如 Google镜像(含学术/其他/中文维基) twitter镜像 Youtube镜像 instagram镜像 Facebook镜像
完整列表请看zmirror自带镜像配置文件

请在遵守当地相关法律法规的前提下使用本项目
本人拒绝为任何商业或非法目的提供任何技术支持
本项目仅为科研人员更方便地查询知识而创建, 请勿大范围传播

若想看代码, 请看 v0.30-dev 这个分支
请不要大量使用...demo服务器马上要爆炸了- -|

Demo

Screenshot

zmirror-screenshot-youtube-mirror--1080P
More screenshots are here: wiki-screenshots

一键部署脚本

https://github.com/aploium/zmirror-onekey
一键部署脚本仍然不稳定, 如果怎么弄都失败, 请看手动教程

builtin configs

Together with the program, provided several (almost) out-of-box configs

Google镜像 (整合中文维基镜像)

Twitter镜像

Instagram镜像

Youtube镜像

Facebook镜像

自带其他的镜像配置文件

Requirements Install and Usage

Dependencies

Required

Optional

Theoretically, any environment that can run Python3.4+, can also run zmirror
Nginx was not officially tested, but it should work pretty well.

However, due to my limited time, zmirror was only fully tested in:

Ubuntu14.04-x86_64 Apache2.4 wsgi python3.4
Ubuntu16.04-x86_64 Apache2.4 wsgi python3.5
windows10-x64 Apache2.4 wsgi python3.5-x64

Ubuntu14.04-x86_64 directly run (I mean, just execute python3 wsgi.py)
windows10-x64 directly run 

Installation and helloworld

This tutorial is mainly for your localhost demo test
If you want to deploy it to server, please complete the localhost demo first

  1. first, install python3
    Debian/Ubuntu apt-get install python3
    Windows go to python's homepage and download Python3.5 (or newer)

  2. install or upgrade flask and requests python3 -m pip install -U flask requests

  3. git clone https://github.com/aploium/zmirror

  4. copy the config_default.py to config.py

    Warning: You should NEVER EVER modify the config_default.py itself
    Please edit the config.py instead of config_default.py
    Unless your are developer.
    Settings in the config.py would override the default ones

  5. Execute it: python3 wsgi.py

  6. Open your browser and enter http://127.0.0.1/, you will see exactly the www.kernel.org, and you can click and browse around. everything of the *.kernel.org is withing the mirror.

  7. please see the following Deploy section

Deploy

请使用: 一键部署脚本

若希望手工部署, 可以看以下教程:

  1. 部署支持HTTPS和HTTP/2的zmirror镜像
  2. 在一台VPS部署多个zmirror镜像

在Nginx下部署, 请看这里(感谢@phuslu)

Or, if you are familiar with flask, you can see flask's official deploy tutorial

Upgrade

Feature

  1. Completely mirror, provide some (almost) out-of-box configs
    创建非常完整的镜像, 既支持古老的网站(比如内网网站), 也支持巨型的现代化的网站
    提供几个(几乎)开箱即用的网站镜像配置文件

  2. Mirror ANY website, highly compatible
    非常高的兼容性和通用性, 可以镜像 任意 网站, 而不只是Google/Wiki/twitter/instagram, 而且功能都非常完整
    并且能很好地适应对现代化的、逻辑复杂、功能庞大的网站
    现在还在开发阶段, 虽然所有网站的绝大部分功能都可以开箱即用, 但是某些网站的某些功能仍然不完整, 正在不断改进

  3. (MIME-based) Local statistic file cache support (especially useful if we have low bandwidth or high latency)
    (基于MIME)本地静态文件缓存支持(当镜像服务器与被镜像服务器之间带宽很小或延迟很大时非常有用)

  4. CDN Support, hot statistic resource can serve by CDN, dramatically increase speed
    CDN支持. 让热门静态资源走CDN, 极大提高用户访问速度(特别是使用国内CDN, 而镜像服务器在国外时)

  5. Easy to config and deploy, highly automatic
    非常容易配置和部署, 镜像一个网站只需要添加它的域名即可

  6. Access control(IP, user-agent), visitor verification(question answer, or custom verification function)
    访问控制(IP, user-agent)与用户验证(回答问题, 也支持写自定义的验证函数)

  7. Automatically rewrite JSON/javascript/html/css, even dynamically generated url can ALSO be handled correctly
    自动重写JSON/javascript/html/css中链接, 甚至即使是动态生成的链接, 都能被正确处理

  8. Stream content support (audio/video)
    流媒体支持(视频/音频)

Issues Report

非常欢迎发issues, 发issues找我聊天都欢迎.
对于Apache(教程部署的即为Apache), 程序的日志在 /var/log/apache2/你自定义的日志文件名_error.log

(以下只是可选步骤)

Report zmirror Internal Error

当zmirror发生内部错误时(浏览器看到一个Internal Error页面), zmirror会把当前状态的快照保存到 zmirror安装目录/error_dump/
可以使用pickle来读取其中的dump文件.
如果存在对应的dump文件, 请在issues中附上

Mirror A Website

本部分需要重写, 写的很乱, 也有点过时了 Mirror a website is very simple.

Just set the target_domain to it's domain, the external_domains to it's external domain and sub domains such as static resource domains (If it has)
save and run, the program will do the other works!

All detects and rewrites are completely AUTOMATICALLY

tips: you can find a website's external domains by using the developer tools of your browser, it will log all network traffics for you

Performance Enhance

Local Cache

Local file cache (along with 304 support) is implanted and enabled by default
If cache hits, you will see an x-zmirror-cache: FileHit in the response header.
Local cache will be deleted and cleaned once program exit.

CDN Support

please see 使用七牛作为zmirror镜像的CDN


Similar Projects

@zxq2233 youtube-php-mirroring
@greatfire website-mirror-by-proxy
@restran web-proxy
@isayme isayme/google
@zjuyxy google200
@cuber ngx_http_google_filter_module
@arnofeng ngx_google_deployment
@imlinhanchao ngx_proxy_wiki
@joymufeng play-google