ben7th / yuwan_counter_service

斗鱼直播统计的后端存储服务
1 stars 1 forks source link

发言的不同用户数的分时统计 #3

Open ben7th opened 9 years ago

ben7th commented 9 years ago

https://github.com/ben7th/yuwan_counter/issues/1 中提到:

希望对发言的不同用户数进行分时统计(例如本周有多少不同而用户进行过发言)

所谓发言的不同用户数指:在指定的时间段里,有且只有张三说了三句话,李四说了两句话,王五说了五句话。 那么发言的不同用户数就是张三,李四,王五三个人。并且可以表示成数据结构:

{
  "张三" => 3,
  "李四" => 2,
  "王五" => 5
}

实现这个统计,需要根据传入的起始时间,结束时间,时间粒度,返回每个时间小段的发言的不同用户数,以及每个用户在这一小段时间的发言数(只需要TOP10)

补充

查询返回的数据中,还需要包括从开始时间到结束时间,发言的不同用户数,以及每个用户在整体时间内的发言数

设计难点

查询每个用户的发言数,可能会比较吃性能。虽然只需要 TOP10,但是查询所有用户的发言数和 TOP 10 用户的发言数,在性能上可能差别不大。给 username 增加索引或许能加速这部分查询。

fushang318 commented 9 years ago

统计一段时间内 chat 类型的ChatLine(聊天类型) 记录数最多的十个用户以及他们的 chat 类型的ChatLine 记录数(起始时间和结束时间是 time_str_type 格式)

  result = ChatLine.username_all_chat_stat(time_str_type, start_time_str, end_time_str)

  # result 结构举例
  # time_str_type = 'month' (还可以是 week day hour minute)
  # start_time_str = '2012-09'
  # end_time_str = '2012-12'
  {
      # key 是 username
      # value 是 user 的 chat 类型的ChatLine 记录数
      "张三" => 3,
      "李四" => 2,
      "王五" => 5,
      "赵六" => 6,
      "赵七" => 7,
      "赵八" => 8,
      "赵九" => 9,
      "赵二" => 2,
      "赵一" => 1,
      "赵拾" => 10
  }

统计一段时间内,每个时间段内(根据 time_str_type 参数确定时间段的单位,比如当是 month 时,时间段是每个月) chat 类型的ChatLine 记录数最多的十个用户以及他们的 chat 类型的ChatLine 记录数(起始时间和结束时间是 time_str_type 格式)

  result = ChatLine.username_section_chat_stat(time_str_type, start_time_str, end_time_str)

  # result 结构举例
  # time_str_type = 'month' (还可以是 week day hour minute)
  # start_time_str = '2012-09'
  # end_time_str = '2012-12'
  {
    "2012-09" => {
      # key 是 username
      # value 是 user 的 chat 类型的ChatLine 记录数
      "张三" => 3,
      "李四" => 2,
      "王五" => 5,
      "赵六" => 6,
      "赵七" => 7,
      "赵八" => 8,
      "赵九" => 9,
      "赵二" => 2,
      "赵一" => 1,
      "赵拾" => 10
    },
    "2012-10" => {
      "张三" => 3,
      "李四" => 2,
      "王五" => 5,
      "赵六" => 6,
      "赵七" => 7,
      "赵八" => 8,
      "赵九" => 9,
      "赵二" => 2,
      "赵一" => 1,
      "赵拾" => 10
    }
    "2012-11" => {
      "张三" => 3,
      "李四" => 2,
      "王五" => 5,
      "赵六" => 6,
      "赵七" => 7,
      "赵八" => 8,
      "赵九" => 9,
      "赵二" => 2,
      "赵一" => 1,
      "赵拾" => 10
    }
    "2012-12" => {
      "张三" => 3,
      "李四" => 2,
      "王五" => 5,
      "赵六" => 6,
      "赵七" => 7,
      "赵八" => 8,
      "赵九" => 9,
      "赵二" => 2,
      "赵一" => 1,
      "赵拾" => 10
    }
  }

实现的方法需要能够支持在 scope 后连用,比如

ChatLine.by_room_id(room_id).username_all_chat_stat(time_str_type, start_time_str, end_time_str)
ChatLine.by_room_id(room_id).username_section_chat_stat(time_str_type, start_time_str, end_time_str)

http api

url 
  /api/chat_lines/username_chat_stat
method
  get 
params
 room_id
  # 指定按什么时间段统计,可以是 month | week | day | hour | minute
  by

  # 起始时间,同模型方法接收的参数格式一致
  start

  # 结束时间,同模型方法接收的参数格式一致
  end

response
  {
    :by => 'month',
    :data => {
      :all => {
        "张三" => 3,
        "李四" => 2,
        "王五" => 5,
        "赵六" => 6,
        "赵七" => 7,
        "赵八" => 8,
        "赵九" => 9,
        "赵二" => 2,
        "赵一" => 1,
        "赵拾" => 10
      },
      :section => {
        "2012-09" => {
          "张三" => 3,
          "李四" => 2,
          "王五" => 5,
          "赵六" => 6,
          "赵七" => 7,
          "赵八" => 8,
          "赵九" => 9,
          "赵二" => 2,
          "赵一" => 1,
          "赵拾" => 10
        },
        "2012-10" => {
          "张三" => 3,
          "李四" => 2,
          "王五" => 5,
          "赵六" => 6,
          "赵七" => 7,
          "赵八" => 8,
          "赵九" => 9,
          "赵二" => 2,
          "赵一" => 1,
          "赵拾" => 10
        },
        "2012-11" => {
          "张三" => 3,
          "李四" => 2,
          "王五" => 5,
          "赵六" => 6,
          "赵七" => 7,
          "赵八" => 8,
          "赵九" => 9,
          "赵二" => 2,
          "赵一" => 1,
          "赵拾" => 10
        },
        "2012-12" => {
          "张三" => 3,
          "李四" => 2,
          "王五" => 5,
          "赵六" => 6,
          "赵七" => 7,
          "赵八" => 8,
          "赵九" => 9,
          "赵二" => 2,
          "赵一" => 1,
          "赵拾" => 10
        }
      }
    }
  }