X-lab2017 / open-digger

Open source analysis tools
https://open-digger.cn
Apache License 2.0
280 stars 78 forks source link

[Bug] API 依然访问 error,是否会修复? #1550

Closed binary5 closed 2 months ago

binary5 commented 2 months ago

Current Behavior

同一个仓库,有的指标是好的,有的指标就会显示error,例如这个error: https://oss.x-lab.info/open_digger/github/wa-lang/wa/change_requests_reviews.json 以后是否会修复呢? 有没有同类型的稳定的接口推荐?

Expected Behavior

No response

Any Additional Comments?

No response

frank-zsy commented 2 months ago

If the repo's data is exported but some files are missing, the data of the metrics may not be available. I manually check the repo, I do not find any pull request review data in the repo.

Be aware that only review comments for specific code line are counted as review comment, direct comments in the pull request are deemed as issue comments according to the definition by GitHub.

binary5 commented 2 months ago

意思是如果某项指标的数据源就是0那么这项指标的json文件就不存在?那是否是将获取不到的指标数据置为0即可?这或许值得在官网某处加一句话说明 :)

frank-zsy commented 2 months ago

Yes, we should add this to the documentation. As OpenDigger will generate millions of files for open source projects with about 10 years, there will be a massive storage waste if we set 0 for all the missing metrics. You can find the export code here: https://github.com/X-lab2017/open-digger/blob/ea9961243b2c5a5a9af8823a4f093ccc60ef79ad/src/cron/tasks/monthly_export.ts#L172 , we will not export metrics if the value from the SQL is the default value of the data type.


是的,我们应该在文档中加入这样的说明,主要是因为 OpenDigger 会为全域上百万仓库生成指标文件供大家使用,而且时间跨度 10 年之久,如果缺失的位置设置为 0,则会导致我们的存储成本大幅度提升,仔细权衡之下我们选择在生成指标时仅保留非 0 的部分以减少存储和网络传输量,但需要使用者有更多的处理。具体的过滤逻辑可见代码:https://github.com/X-lab2017/open-digger/blob/ea9961243b2c5a5a9af8823a4f093ccc60ef79ad/src/cron/tasks/monthly_export.ts#L172 ,这里我们给每种类型的字段设置了默认值,如果查询到的是默认值的话就不会导出了。