codemonger-io / codemonger

Source code of https://codemonger.io
https://codemonger.io/
MIT License
1 stars 2 forks source link

Access analysis (infrastructure) #30

Closed kikuomax closed 2 years ago

kikuomax commented 2 years ago

I would like to analyze CloudFront logs to know traffic to my site. How about to try Amazon Redshift serverless?

kikuomax commented 2 years ago

I think we need specific goals of access analysis.

kikuomax commented 2 years ago

I think we need specific goals of access analysis.

kikuomax commented 2 years ago

The columns of the fact table for access logs.

  1. datetime: TIMESTAMP
    • date + time
  2. seq_num: INT
    • To retain the order of entries at the same timestamp in the original log file.
  3. edge_location: INTedge_location dimension table
    • x-edge-location
  4. sc_bytes: BIGINT
    • sc-bytes
  5. cs_method: VARCHAR
    • cs-method
  6. page: INTpage dimension table
    • cs-uri-stem
  7. status: SMALLINT
    • sc-status
  8. referer: BIGINT DISTKEYreferer dimension table
    • cs(Referer)
  9. user_agent: BIGINTuser_agent dimension table
    • cs(User-Agent)
  10. cs_protocol: VARCHAR
    • cs-protocol
  11. cs_bytes: BIGINT
    • cs-bytes
  12. time_taken: FLOAT4
    • time-taken
  13. edge_response_result_type: INTresult_type dimension table
    • x-edge-response-result-type
  14. time_to_first_byte: FLOAT4
    • time-to-first-byte

SORTKEY: datetime, seq_num

kikuomax commented 2 years ago

The columns of the edge_location dimension table.

  1. id: INT
  2. code: VARCHAR SORTKEY UNIQUE
    • x-edge-location
kikuomax commented 2 years ago

The columns of the page dimension table.

  1. id: INT
  2. path: VARCHAR(2048) SORTKEY UNIQUE
    • cs-uri-stem
kikuomax commented 2 years ago

The columns of the referer dimension table.

  1. id: BIGINT
  2. url: VARCHAR(2048) SORTKEY UNIQUE
    • cs(Referer)
kikuomax commented 2 years ago

The columns of the user_agent dimension table.

  1. id: BIGINT
  2. user_agent: VARCHAR(2048) SORTKEY UNIQUE
    • cs(User-Agent)
kikuomax commented 2 years ago

The columns of the result_type dimension table.

  1. id: INT
  2. result_type: VARCHAR SORTKEY UNIQUE
    • x-edge-response-result-type
kikuomax commented 2 years ago

I have decided to designate this issue for development of the basic infrastructure. I will develop tools for analysis on top of the developed infrastructure in another issue.