cnpm / cnpmcore

Private NPM Registry for Enterprise
https://npmmirror.com
MIT License
607 stars 80 forks source link

feat: proxy mode #571

Open hezhengxu2018 opened 1 year ago

hezhengxu2018 commented 1 year ago

366 开启代理模式时如果找不到依赖会直接返回上游仓库的manifest信息并缓存于nfs,当请求的tgz文件不存在时从上游仓库获取并返回,同时创建对应版本的同步任务。每小时检查更新已缓存的manifest文件保证上游仓库发布新版本时不会因为缓存落后而404。

Summary by CodeRabbit

codecov[bot] commented 1 year ago

Codecov Report

Attention: Patch coverage is 97.43276% with 21 lines in your changes missing coverage. Please review.

Project coverage is 96.83%. Comparing base (bd49917) to head (1ce994c). Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
app/core/service/ProxyCacheService.ts 96.44% 9 Missing :warning:
app/port/schedule/SyncProxyCacheWorker.ts 91.07% 5 Missing :warning:
app/repository/ProxyCacheRepository.ts 95.16% 3 Missing :warning:
app/port/controller/ProxyCacheController.ts 98.71% 2 Missing :warning:
app/port/schedule/CheckProxyCacheUpdateWorker.ts 96.22% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #571 +/- ## ========================================== + Coverage 96.81% 96.83% +0.02% ========================================== Files 181 188 +7 Lines 18003 18799 +796 Branches 2336 2466 +130 ========================================== + Hits 17429 18204 +775 - Misses 574 595 +21 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

hezhengxu2018 commented 1 year ago

目前代理模式无法配置上游仓库认证头的信息,认证头应该作为仓库的一个属性记录在数据库里,可以通过接口修改。

elrrrrrrr commented 10 months ago

@hezhengxu2018 🤩 已经改完了吗?

hezhengxu2018 commented 10 months ago

@hezhengxu2018 🤩 已经改完了吗?

是的,现在会定时更新已缓存的manifest,不过还没有手动管理缓存的接口。verdicca和nexus都没做这个,先基础功能确定没问题再做吧

hezhengxu2018 commented 9 months ago

管理缓存的接口也加上了,功能相对来说比较完整了,帮忙看一下吧 @elrrrrrrr @fengmk2

elrrrrrrr commented 9 months ago

管理缓存的接口也加上了,功能相对来说比较完整了,帮忙看一下吧 @elrrrrrrr @fengmk2

🤩 改动内容比较多 我明天详细看下 🙏🏻

elrrrrrrr commented 9 months ago

@hezhengxu2018

proxyMode 定位是做代理模式,然后本地 registry 做缓存加速?需要确认一下 manifest 以本地为准还是以上游为准。

目前实现是 manifest 和 tgz 请求时,将代理结果直接返回,并创建同步任务

  1. proxyMode 下,对于
    1. 优先返回 本地 registry 信息
    2. 若本地没有版本数据,则代理返回上游 registry
  2. tgz 访问时触发下载,仅同步当前访问报的单个版本

如果 proxyMode 中直接命中 tgz 下载,需要等异步定时任务补偿后才能继续访问,否则 manifest 会返回单个版本?客户单查包信息的时候就过期了。

我们是否可以改为 proxyMode 下,始终返回上游 registry 信息已获取更高的实时性。

hezhengxu2018 commented 9 months ago

@hezhengxu2018

proxyMode 定位是做代理模式,然后本地 registry 做缓存加速?需要确认一下 manifest 以本地为准还是以上游为准。

目前实现是 manifest 和 tgz 请求时,将代理结果直接返回,并创建同步任务

  1. proxyMode 下,对于
  2. 优先返回 本地 registry 信息
  3. 若本地没有版本数据,则代理返回上游 registry
  4. tgz 访问时触发下载,仅同步当前访问报的单个版本

如果 proxyMode 中直接命中 tgz 下载,需要等异步定时任务补偿后才能继续访问,否则 manifest 会返回单个版本?客户单查包信息的时候就过期了。

我们是否可以改为 proxyMode 下,始终返回上游 registry 信息已获取更高的实时性。

开启proxyMode之后返回的manifest都是上游仓库上次更新的manifest,不会使用数据库里的manifest信息。如果上游仓库无法正常使用了,切回到none模式下才会返回代理仓库已缓存版本的manifest。如果异步任务还没同步完成会一直通过反向代理返回上游仓库的tgz信息,用户不会需要等待异步任务完成,当异步任务完成后才会优先从对象存储中读取tgz。

代理仓库主要的功能点是在访问外网或者官方npm仓库非常缓慢甚至无法访问的情况下能够缓存结果加速内网用户安装依赖,同时即使外网无法访问了也不影响内网用户照常使用已经缓存的依赖,所以始终返回上游仓库可能不行,如果内网用户发现代理仓库的缓存需要更新或者缓存脏数据了可以使用/-/proxy-cache接口进行一些缓存的刷新删除。

hezhengxu2018 commented 9 months ago

manifest 以本地为准,因为nexus是这么做的。代理模式整体是一个缓存,如果网络不好还一直使用上游仓库的索引的话就没有缓存的意义了,不过nexus默认刷新manifest的频率很高,30分钟就会去刷新一次,我设置每天刷新一次感觉有点保守了。

hezhengxu2018 commented 9 months ago

改了一部分,有些感觉不合适或者不太会改的,抽空再看一下吧 @elrrrrrrr

elrrrrrrr commented 9 months ago

@hezhengxu2018 等明天再具体看看,新年快乐 ヽ(≧◡≦)八(o^ ^o)ノ

hezhengxu2018 commented 8 months ago

想起来搜索接口没有做代理,进入代理模式后搜索的结果也应该是代理的,因为之前没有实现搜索接口的所以就漏了。这个问题等基础功能合并之后再修复吧。

coderabbitai[bot] commented 6 months ago

Walkthrough

This update introduces a proxy caching mechanism to enhance package management, featuring the new ProxyCache entity and expanded functionalities across various services and controllers. New scheduled workers have been implemented for efficient synchronization and updates, while constants and enums have been adjusted to support these changes. Additionally, comprehensive testing improvements ensure robust handling of package manifests and versions, contributing to overall system reliability.

Changes

File Path Change Summary
app/common/adapter/NPMRegistry.ts genAuthorizationHeader method changed from private to public.
app/common/constants.ts Added constants PROXY_CACHE_DIR_NAME and ABBREVIATED_META_TYPE. Expanded SyncMode enum to include proxy.
app/common/enum/Task.ts Added new enum value UpdateProxyCache to the TaskType enum.
app/core/entity/ProxyCache.ts Introduced ProxyCache class representing a cache entity with methods for creation and updates.
app/core/entity/Task.ts Added imports and new types related to proxy caching tasks.
app/core/service/ProxyCacheService.ts Implemented functionality for managing proxy caching of package manifests and versions.
app/port/controller/ProxyCacheController.ts Introduced new HTTP methods for managing proxy caches.
...controller/package/DownloadPackageVersionTar.ts Enhanced package version retrieval with new imports and method.
...controller/package/ShowPackageController.ts Utilized new constants and introduced ProxyCacheService.
...controller/package/ShowPackageVersionController.ts Introduced logic changes and new imports related to handling package versions.
app/port/schedule/CheckProxyCacheUpdateWorker.ts Introduced a scheduled worker for updating proxy cache entries.
app/port/schedule/CheckRecentlyUpdatedPackages.ts Added SyncMode.proxy to the notAllowUpdateModeList array.
app/port/schedule/SyncProxyCacheWorker.ts Introduced a class for synchronizing proxy cache with injected services.
app/repository/ProxyCacheRepository.ts Managed proxy cache entities with new methods and imports.
sql/3.47.0.sql Introduced a new table proxy_caches for cached file information.
test/core/service/ProxyCacheService.test.ts Added tests for ProxyCacheService.
...controller/package/DownloadPackageVersionTarController.test.ts Added a new test case for synchronization task creation.
...controller/package/ShowPackageController.test.ts Added import statement for SyncMode and a new test case.
...controller/package/ShowPackageVersionController.test.ts Added import statement for SyncMode and a new test case.
test/repository/ProxyCachePepository.test.ts Added tests for ProxyCacheRepository.
test/schedule/CheckProxyCacheUpdateWorker.test.ts Verified the creation of an update task by a repository.
test/schedule/SyncProxyCacheWorker.test.ts Verified the functionality of a synchronization worker for proxy cache.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant ProxyCacheController
    participant ProxyCacheService
    participant ProxyCacheRepository

    User->>ProxyCacheController: Request to get package version
    ProxyCacheController->>ProxyCacheService: Check cache
    ProxyCacheService->>ProxyCacheRepository: Retrieve from cache
    ProxyCacheRepository-->>ProxyCacheService: Return cached data
    ProxyCacheService-->>ProxyCacheController: Return data
    ProxyCacheController-->>User: Respond with package version

    Note right of User: If not in cache, fetch from registry and update cache

Suggested reviewers

Poem

In a cache so clever, packages rest,
Proxy in place, they've passed the test.
Services hum in a rhythmic sway,
Syncing treasures, night and day.
Bunnies code with digital might,
Ensuring all is cached just right. 🌟


Recent review details **Configuration used: CodeRabbit UI** **Review profile: CHILL**
Commits Files that changed from the base of the PR and between ce71b9f479281911d596b62a1b14221f7e2feb75 and 1ce994c54282523d2250c7a49c035ae440cccd05.
Files selected for processing (1) * app/core/service/ProxyCacheService.ts (1 hunks)
Additional context used
Learnings (1)
app/core/service/ProxyCacheService.ts (1)
``` Learnt from: hezhengxu2018 PR: cnpm/cnpmcore#571 File: app/core/service/ProxyCacheService.ts:161-209 Timestamp: 2024-07-19T06:26:05.533Z Learning: In the `ProxyCacheService` class, requests to the upstream repository already throw an error if they fail, making additional checks for the response status unnecessary. ```
Additional comments not posted (9)
app/core/service/ProxyCacheService.ts (9)
`21-23`: **LGTM!** The `isoNow` function is a simple utility that correctly returns the current date and time in ISO format. --- `25-27`: **LGTM!** The `isPkgManifest` function correctly checks if the given `fileType` is a package manifest by comparing it against the relevant constants. --- `54-59`: **LGTM!** The `getPackageVersionTarResponse` method correctly checks if the package is blocked based on the configuration and retrieves the response using the `getProxyResponse` method if it is not blocked. --- `61-77`: **LGTM!** The `getPackageManifest` method correctly retrieves the package manifest, either from the cache or by fetching the rewritten manifest. It also handles storing the rewritten manifest and updating the proxy cache in a background task. --- `79-102`: **LGTM!** The `getPackageVersionManifest` method correctly retrieves the package version manifest, either from the cache or by fetching the rewritten manifest. It handles resolving the version from dist tags if a valid semver version is not provided. It also handles storing the rewritten manifest and updating the proxy cache in a background task. --- `104-110`: **LGTM!** The `removeProxyCache` method correctly removes the proxy cache entry from both the NFS and the repository based on the provided parameters. --- `112-114`: **LGTM!** The `createTask` method correctly creates an update proxy cache task using the provided parameters and returns the created task. --- `116-118`: **LGTM!** The `findExecuteTask` method correctly finds an executable update proxy cache task using the `taskService.findExecuteTask` method. --- `120-152`: **LGTM!** The `executeTask` method correctly executes an update proxy cache task by retrieving the cached files, updating the manifest, and saving the changes. It handles errors during the execution and logs the appropriate messages. Upon successful execution, it updates the package etag and manifests in the cache.
--- Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?
Share - [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai) - [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai) - [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai) - [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)
Tips ### Chat There are 3 ways to chat with [CodeRabbit](https://coderabbit.ai): - Review comments: Directly reply to a review comment made by CodeRabbit. Example: -- `I pushed a fix in commit , please review it.` -- `Generate unit testing code for this file.` - `Open a follow-up GitHub issue for this discussion.` - Files and specific lines of code (under the "Files changed" tab): Tag `@coderabbitai` in a new review comment at the desired location with your query. Examples: -- `@coderabbitai generate unit testing code for this file.` -- `@coderabbitai modularize this function.` - PR comments: Tag `@coderabbitai` in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples: -- `@coderabbitai generate interesting stats about this repository and render them as a table.` -- `@coderabbitai read src/utils.ts and generate unit testing code.` -- `@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.` -- `@coderabbitai help me debug CodeRabbit configuration file.` Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. ### CodeRabbit Commands (Invoked using PR comments) - `@coderabbitai pause` to pause the reviews on a PR. - `@coderabbitai resume` to resume the paused reviews. - `@coderabbitai review` to trigger an incremental review. This is useful when automatic reviews are disabled for the repository. - `@coderabbitai full review` to do a full review from scratch and review all the files again. - `@coderabbitai summary` to regenerate the summary of the PR. - `@coderabbitai resolve` resolve all the CodeRabbit review comments. - `@coderabbitai configuration` to show the current CodeRabbit configuration for the repository. - `@coderabbitai help` to get help. ### Other keywords and placeholders - Add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed. - Add `@coderabbitai summary` to generate the high-level summary at a specific location in the PR description. - Add `@coderabbitai` anywhere in the PR title to generate the title automatically. ### CodeRabbit Configuration File (`.coderabbit.yaml`) - You can programmatically configure CodeRabbit by adding a `.coderabbit.yaml` file to the root of your repository. - Please see the [configuration documentation](https://docs.coderabbit.ai/guides/configure-coderabbit) for more information. - If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: `# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json` ### Documentation and Community - Visit our [Documentation](https://coderabbit.ai/docs) for detailed information on how to use CodeRabbit. - Join our [Discord Community](https://discord.com/invite/GsXnASn26c) to get help, request features, and share feedback. - Follow us on [X/Twitter](https://twitter.com/coderabbitai) for updates and announcements.
hezhengxu2018 commented 5 months ago

上游仓库可能会302返回一个重定向的地址,内网无法访问上游仓库返回的302地址。直接用egg-js的proxy插件不合适,这样的话proxy这个插件也有点不需要了

hezhengxu2018 commented 5 months ago

好像之前想的有点问题,代理模式下可以不用管本地数据库的内容。依赖的manifest能从proxyCache读到就用缓存的,没有就返回上游的流,因为代理模式下只要创建的任务是指定版本的,本地的永远是不完整的,不用和其他模式的逻辑在一起。

hezhengxu2018 commented 3 months ago
  1. 优化了反向代理请求的返回处理,tgz文件现在不会写入本地磁盘的返回而是直接使用上游返回的流,速度提升明显。但是manifest因为需要把文件里上游仓库的地址改为应用的地址没有办法直接以流的形式返回,对于大json可能会有压力。
  2. 使用独立的请求函数发送proxy mode的请求,会带上代理的请求头,减少对原有逻辑的改动。
  3. 移除了config 的ts校验,现在如果想正常使用代理模式需要手动设置redirectNotFound为false,修改ts时不会提示。这部分的校验让config文件的ts变得有些复杂了。