Open ben7th opened 9 years ago
统计一段时间内 chat 类型的ChatLine(聊天类型) 记录数最多的十个用户以及他们的 chat 类型的ChatLine 记录数(起始时间和结束时间是 time_str_type 格式)
result = ChatLine.username_all_chat_stat(time_str_type, start_time_str, end_time_str)
# result 结构举例
# time_str_type = 'month' (还可以是 week day hour minute)
# start_time_str = '2012-09'
# end_time_str = '2012-12'
{
# key 是 username
# value 是 user 的 chat 类型的ChatLine 记录数
"张三" => 3,
"李四" => 2,
"王五" => 5,
"赵六" => 6,
"赵七" => 7,
"赵八" => 8,
"赵九" => 9,
"赵二" => 2,
"赵一" => 1,
"赵拾" => 10
}
统计一段时间内,每个时间段内(根据 time_str_type 参数确定时间段的单位,比如当是 month 时,时间段是每个月) chat 类型的ChatLine 记录数最多的十个用户以及他们的 chat 类型的ChatLine 记录数(起始时间和结束时间是 time_str_type 格式)
result = ChatLine.username_section_chat_stat(time_str_type, start_time_str, end_time_str)
# result 结构举例
# time_str_type = 'month' (还可以是 week day hour minute)
# start_time_str = '2012-09'
# end_time_str = '2012-12'
{
"2012-09" => {
# key 是 username
# value 是 user 的 chat 类型的ChatLine 记录数
"张三" => 3,
"李四" => 2,
"王五" => 5,
"赵六" => 6,
"赵七" => 7,
"赵八" => 8,
"赵九" => 9,
"赵二" => 2,
"赵一" => 1,
"赵拾" => 10
},
"2012-10" => {
"张三" => 3,
"李四" => 2,
"王五" => 5,
"赵六" => 6,
"赵七" => 7,
"赵八" => 8,
"赵九" => 9,
"赵二" => 2,
"赵一" => 1,
"赵拾" => 10
}
"2012-11" => {
"张三" => 3,
"李四" => 2,
"王五" => 5,
"赵六" => 6,
"赵七" => 7,
"赵八" => 8,
"赵九" => 9,
"赵二" => 2,
"赵一" => 1,
"赵拾" => 10
}
"2012-12" => {
"张三" => 3,
"李四" => 2,
"王五" => 5,
"赵六" => 6,
"赵七" => 7,
"赵八" => 8,
"赵九" => 9,
"赵二" => 2,
"赵一" => 1,
"赵拾" => 10
}
}
实现的方法需要能够支持在 scope 后连用,比如
ChatLine.by_room_id(room_id).username_all_chat_stat(time_str_type, start_time_str, end_time_str)
ChatLine.by_room_id(room_id).username_section_chat_stat(time_str_type, start_time_str, end_time_str)
http api
url
/api/chat_lines/username_chat_stat
method
get
params
room_id
# 指定按什么时间段统计,可以是 month | week | day | hour | minute
by
# 起始时间,同模型方法接收的参数格式一致
start
# 结束时间,同模型方法接收的参数格式一致
end
response
{
:by => 'month',
:data => {
:all => {
"张三" => 3,
"李四" => 2,
"王五" => 5,
"赵六" => 6,
"赵七" => 7,
"赵八" => 8,
"赵九" => 9,
"赵二" => 2,
"赵一" => 1,
"赵拾" => 10
},
:section => {
"2012-09" => {
"张三" => 3,
"李四" => 2,
"王五" => 5,
"赵六" => 6,
"赵七" => 7,
"赵八" => 8,
"赵九" => 9,
"赵二" => 2,
"赵一" => 1,
"赵拾" => 10
},
"2012-10" => {
"张三" => 3,
"李四" => 2,
"王五" => 5,
"赵六" => 6,
"赵七" => 7,
"赵八" => 8,
"赵九" => 9,
"赵二" => 2,
"赵一" => 1,
"赵拾" => 10
},
"2012-11" => {
"张三" => 3,
"李四" => 2,
"王五" => 5,
"赵六" => 6,
"赵七" => 7,
"赵八" => 8,
"赵九" => 9,
"赵二" => 2,
"赵一" => 1,
"赵拾" => 10
},
"2012-12" => {
"张三" => 3,
"李四" => 2,
"王五" => 5,
"赵六" => 6,
"赵七" => 7,
"赵八" => 8,
"赵九" => 9,
"赵二" => 2,
"赵一" => 1,
"赵拾" => 10
}
}
}
}
https://github.com/ben7th/yuwan_counter/issues/1 中提到:
所谓发言的不同用户数指:在指定的时间段里,有且只有张三说了三句话,李四说了两句话,王五说了五句话。 那么发言的不同用户数就是张三,李四,王五三个人。并且可以表示成数据结构:
实现这个统计,需要根据传入的起始时间,结束时间,时间粒度,返回每个时间小段的发言的不同用户数,以及每个用户在这一小段时间的发言数(只需要TOP10)
补充
查询返回的数据中,还需要包括从开始时间到结束时间,发言的不同用户数,以及每个用户在整体时间内的发言数
设计难点
查询每个用户的发言数,可能会比较吃性能。虽然只需要 TOP10,但是查询所有用户的发言数和 TOP 10 用户的发言数,在性能上可能差别不大。给 username 增加索引或许能加速这部分查询。