TencentBlueKing / bk-iam-saas

BK-IAM is a centralized permission management service provided by The Tencent BlueKing; based on ABAC
Other
30 stars 40 forks source link

[Backend] 后台redis缓存重新review #1559

Open zhu327 opened 2 years ago

zhu327 commented 2 years ago
  1. 重新review下现在生产环境版本的缓存配置, 出问题的地方在获取subject - group的关系这个点, auth/query接口
  2. 重新review下新版本的鉴权链路, redis缓存的问题

考虑redis作为备份存储, db挂了的情况下还能扛一定的时间

wklken commented 2 years ago

TODO: 将redis升级为另一套存储, 确保数据一致性

  1. redis cache能否都改长
  2. 不用defer, 操作失败不清缓存
  3. 删失败了, 需要有补偿机制
  4. 都在白天操作, 过期时间 假设是 7 天, TTL 7天+12 小时; 把白天的操作缓存过期时间延迟到晚上失效
  5. 删失败, 加retry, retry 失败, 通过队列等机制延迟删除
wklken commented 2 years ago

问题: 目前并没有做到redis挂了不影响服务

system error[request_id=eecd595e3dd243eba0613dd2503a99a2]: [Handler:Query] systemID=`demo`, body=`{baseRequest:{System:demo Subject:{Type:user ID:user005}} Resources:[] Action:{ID:access_developer_center}}` 
� [PDP:Query] queryAndPartialEvalConditions fail%!(EXTRA types.Action={access_developer_center 0xc0007140e0}) 
� [PDP:queryAndPartialEvalConditions] GetEffectAuthTypeGroupPKs systemID=`demo`, subject=`{Type:user ID:user005 Attribute:0xc0007140d8}`, action=`{ID:access_developer_center Attribute:0xc0007140e0}` fail 
� [PRP:getEffectSubjectPKs] ListSubjectEffectGroups deptPKs=`[]` fail 
� [Cache:ListSystemSubjectEffectGroups] batchGetSystemSubjectGroups systemID=`demo`, pks=`[5]` fail 
� [Cache:batchGetSystemSubjectGroups] SubjectGroupCache.BatchGet keys=`[{SystemID:demo SubjectPK:5}]` fail 
� [Raw:Error] EOF
image

这里报错, 应该fallback到 db 查询

wklken commented 2 years ago
image

假设服务能在redis挂了的情况下正常运行, 那么不应该拉不起来(需要保证鉴权服务正常)

wklken commented 2 years ago
system error[request_id=7a771361630c49f4be34756063757631]: [Handler:Auth] systemID=`demo`, body=`{baseRequest:{System:demo Subject:{Type:user ID:user105}} Resources:[{System:demo Type:app ID:002 Attribute:map[]}] Action:{ID:view_app}}` 
 [PDP:Eval] GetEffectAuthTypeGroupPKs systemID=`demo`, subject=`{Type:user ID:user105 Attribute:0xc00052eb60}`, action=`{ID:view_app Attribute:0xc00052eb68}` fail 
 [GroupRedisLayer:Retrieve] batchGetGroupAuthType fail groupPKs=`[2105]` 
 [Raw:Error] dial tcp 127.0.0.1:6379: connect: connection refused
image
system error[request_id=21ca36ecce4c4f56826b83fec95d21e9]: [Handler:Auth] systemID=`demo`, body=`{baseRequest:{System:demo Subject:{Type:user ID:user105}} Resources:[{System:demo Type:app ID:002 Attribute:map[]}] Action:{ID:view_app}}` 
 [PDP:Eval] GetEffectAuthTypeGroupPKs systemID=`demo`, subject=`{Type:user ID:user105 Attribute:0xc0000ca908}`, action=`{ID:view_app Attribute:0xc0000ca910}` fail 
 [GroupRedisLayer:Retrieve] batchSetGroupAuthTypeCache fail missGroupAuthTypes=`[{GroupPK:2105 AuthType:2}]` 
 [Raw:Error] EOF
system error[request_id=039dc2fb87e64746bc44a20e19dcc496]: [Handler:Auth] systemID=`demo`, body=`{baseRequest:{System:demo Subject:{Type:user ID:user105}} Resources:[{System:demo Type:app ID:002 Attribute:map[]}] Action:{ID:view_app}}` 
 [PDP:Eval] rbacEval systemID=`demo`, actionID=`%!d(string=view_app)`, resources=`[{System:demo Type:app ID:002 Attribute:map[]}]`, groupPKs=`[2105]` fail 
 [PDP:rbacEval] GetResourceActionAuthorizedGroupPKs fail, system=`demo` action=`{ID:view_app Attribute:0xc00059e268}` resource=`{System:demo Type:app ID:002 TypePK:1}` 
 [Raw:Error] EOF
wklken commented 2 years ago

缓存删除失败怎么办? 是否有机制能保证数据一致性

?

zhu327 commented 1 year ago

先解决第一个问题:

  1. redis跪了, 可以fallback到mysql正常服务