Open gm3dmo opened 5 months ago
Does conversion to json work:
[Bundle #135189] 135189 $ wc -l system-logs/split-logs-syslog.1/gitrpcd.*
290161 system-logs/split-logs-syslog.1/gitrpcd.json
290161 system-logs/split-logs-syslog.1/gitrpcd.log
580322 total
[Bundle #135189] 135189 $ head -1 system-logs/split-logs-syslog.1/gitrpcd.log ; tail -1 system-logs/split-logs-syslog.1/gitrpcd.log
Jun 6 15:59:23 github-circle-electron-com-primary gitrpcd[4289]: time=2024-06-06T15:59:23.796887689Z level=INFO msg=request app=gitrpcd sha=19857c54c7a0c0d822992c234066f7df356db7ed host=github-circle-electron-com-primary method=CONNECT url=//git-upload-pack:80
Jun 6 16:59:51 github-circle-electron-com-primary gitrpcd[4289]: time=2024-06-06T16:59:51.562751566Z level=INFO msg="command exited" app=gitrpcd sha=19857c54c7a0c0d822992c234066f7df356db7ed host=github-circle-electron-com-primary request_id=f83c9a875f8aaee1a9277ec8a322a06c component=githttpdaemon method=CONNECT request_url=//git-upload-pack:80 user_agent=babeld/f3ea3d34 command=git-upload-pack path=/0/nw/05/ed/e5/46448/49702.git at=finish elapsed=2.290181 exit=0
select min(time), max(time) from gitrpcd;
┌────────────────────────────────┬────────────────────────────────┐
│ min("time") │ max("time") │
│ varchar │ varchar │
├────────────────────────────────┼────────────────────────────────┤
│ 2024-06-06T15:59:23.796887689Z │ 2024-06-06T17:05:59.523893746Z │
└────────────────────────────────┴────────────────────────────────┘
$ grep "2024-06-06T17:05:59.523893746Z" system-logs/split-logs-syslog/gitrpcd.log
Jun 6 17:05:59 github-schneider-electric-com-primary gitrpcd[4289]: time=2024-06-06T17:05:59.523893746Z level=INFO msg=request app=gitrpcd sha=19857c54c7a0c0d822992c234066f7df356db7ed host=github-schneider-electric-com-primary service=trees component=twirp twirp_status=200 twirp_method=CompareTrees twirp_service=TreesAPI twirp_package=github.spokes.trees.v1 request_id=f99be8e8e7188ace9d9877e858fca920 user_agent="github-enterpriseworker/6995b8978891dac1dc583f8baae2753f7ccd93ce spokesd/88fbad0cdc52afbac03a3edb70a913cf07bd22f6" remote_addr=127.0.0.1:56394 http_version=HTTP/1.1 spec=33396/36060 request_duration=3.953501
select count(*) from gitrpcd
;
┌──────────────┐
│ count_star() │
│ int64 │
├──────────────┤
│ 310521 │
└──────────────┘
wc -l system-logs/split-logs-syslog/gitrpcd.log system-logs/split-logs-syslog.1/gitrpcd.log
20360 system-logs/split-logs-syslog/gitrpcd.log
290161 system-logs/split-logs-syslog.1/gitrpcd.log
310521 total
Convert a bundle to jsonl and upload to DuckDB for superfast querying
The main benefit of this approach is that it the ingestion doesn't get wrecked by log format changes. It simply ingests everything.
It seems that the log files in a bundle cover only a very short timespan now.
Setup:
Queries
3.12
3.11
gitrpcd
gitrpcd queries
authzd
babeld
babeld queries
Unicorn
Hookshot
Exceptions