PrivacyLx / privacylx-issue-tracker

PrivacyLx Issue Tracker repository
0 stars 0 forks source link

Monitoring of snowflake servers #101

Open francisco-core opened 1 year ago

francisco-core commented 1 year ago

Track our usage of servers:

https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40121

francisco-core commented 1 year ago

I looked a bit into this and here are my findings:

We need three components

  1. snowflake logging + expose to prometheus (to extract data from snowflake and present to prometheous)
  2. promethous + grafana instance (to visualize data)
  3. authentication in all places to avoid exposing granular metrics

1. Snowflake logging

Upstream initiatives (1, 2) to have snowflake expose directly data to prometheus appear to be stalled.

However, I found this work, which gets around that limitation by logging snowflake data to a file and then reading from that file and exposing metrics to prometheous.

francisco-core commented 1 year ago

Investigation Results

snowflake.py ``` # This Script uses the following dependencies # pip install nums-from-string # pip install datetime # # To Run this script type: # python main.py # # The default is ./docker_snowflake.log # # Example: # python main.py snow.log # # Written By Allstreamer_ # Licenced Under MIT # # Enhanced by MariusHerget # Further enhanced and modified by mrdrache333 import sys import re from datetime import datetime, timedelta from http.server import HTTPServer, BaseHTTPRequestHandler def nums_from_string(string): return [int(num) for num in re.findall(r"\d+", string)] def readFile(): # Read in log file as lines lines_all = [] with open(logfile_path, "r") as file: lines_all = file.readlines() return lines_all # Catchphrase for lines who do not start with a timestamp def catchTimestampException(rowSubString, timestampFormat): try: return datetime.strptime(rowSubString, timestampFormat) except Exception: return datetime.strptime("1970/01/01 00:00:00", "%Y/%m/%d %H:%M:%S") # Filter the log lines based on a time delta in hours def filterLinesBasedOnTimeDelta(log_lines, hours): now = datetime.now() length_timestamp_format = len(datetime.strftime(now, timestamp_format)) return filter(lambda row: now - timedelta(hours=hours) <= catchTimestampException(row[0:length_timestamp_format], timestamp_format) <= now, log_lines) # Convert traffic information (in B, KB, MB, or GB) to B (Bytes) and add up to a sum def get_byte_count(log_lines): byte_count = 0 for row in log_lines: symbols = row.split(" ") # Use a dictionary to map units to their byte conversion values units = { "B": 1, "KB": 1024, "MB": 1024 * 1024, "GB": 1024 * 1024 * 1024 } # Use the dictionary to get the byte conversion value for the current unit byte_count += int(symbols[1]) * units[symbols[2]] return byte_count # Filter important lines from the log # Extract number of connections, uploaded traffic in GB and download traffic in GB def getDataFromLines(lines): # Filter out important lines (Traffic information) lines = [row.strip() for row in lines if "In the" in row] lines = [row.split(",", 1)[1] for row in lines] # Filter out all traffic log lines who did not had any connection lines = [row for row in lines if nums_from_string(row)[0] != 0] # Extract number of connections as a sum connections = sum([nums_from_string(row)[0] for row in lines]) # Extract upload and download data lines = [row.split("Relayed")[1] for row in lines] upload = [row.split(",")[0].strip() for row in lines] download = [row.split(",")[1].strip()[:-1] for row in lines] # Convert upload/download data to GB upload_gb = get_byte_count(upload) / 1024 / 1024 / 1024 download_gb = get_byte_count(download) / 1024 / 1024 / 1024 # Return information as a dictionary for better structure return {'connections': connections, 'upload_gb': upload_gb, 'download_gb': download_gb} def main(): # Read file lines_all = readFile() # Get the statistics for various time windows # e.g. all time => getDataFromLines(lines_all, 24) # e.g. last 24h => getDataFromLines(filterLinesBasedOnTimeDelta(lines_all, 24)) # e.g. last Week => getDataFromLines(filterLinesBasedOnTimeDelta(lines_all, 24 * 7)) stats = { 'All time': getDataFromLines(lines_all), 'Last 24h': getDataFromLines(filterLinesBasedOnTimeDelta(lines_all, 24)), 'Last Week': getDataFromLines(filterLinesBasedOnTimeDelta(lines_all, 24 * 7)), } # Print all the results in the Prometheus metric format for time in stats: stat = stats[time] # Write the text message to the response body print( f"snowflake_served_people{{time=\"{time}\"}} {stat['connections']}\n" + f"snowflake_upload_gb{{time=\"{time}\"}} {round(stat['upload_gb'], 4)}\n" + f"snowflake_download_gb{{time=\"{time}\"}} {round(stat['download_gb'], 4)}" ) # Format of your timestamps in the beginning of the log # e.g. "2022/01/01 16:50:30 " => "%Y/%m/%d %H:%M:%S" timestamp_format = "%Y/%m/%d %H:%M:%S" # Log file path from arguments (default: ./docker_snowflake.log) logfile_path = sys.argv[1] if len(sys.argv) > 1 else "./docker_snowflake.log" main() ```
anadahz commented 1 year ago

Notes