bvt123 / SCH

GNU General Public License v3.0
1 stars 4 forks source link

golang runner instead of bash #4

Open bvt123 opened 10 months ago

bvt123 commented 10 months ago

The current MVP implementation of the scheduler is written on bash and has problems with stability and dependencies (clickhouse-client needed), especially when starting in a containerized environment.

bash scripts are here - https://github.com/bvt123/SCH/tree/main/bash

Let's rewrite them in Golang to a small compact code and binary.

  1. driver - https://clickhouse.com/docs/en/integrations/go

  2. main loop

while true ; do
  clickhouse-client -q  "select * from SCH.LagLive" -f TSVRaw | \
  while IFS=$'\t' read -r topic host sql ts version; do
       process $topic "$sql" $version $host
  done
  sleep 1
done

It gets jobs by SQL query and executes the received SQL code asynchronously with a semaphore set by some tag (topic) to prevent simultaneous execution of SQL code for a particular "topic". Clickhouse host (cluster node) to connect to is received together with the query.

  1. run query
echo $query | clickhouse-client -h $4 -n -f TSV --param_topic=${1}_p --log_comment=$HID:$2"
  1. logs & error handling

the result of SQL code is String or error message.
Normal output should be sent to the text log in case of errors, they should be processed and stored in text log, Offsets & Logs tables:

        printf '%(%Y-%m-%d %H:%M:%S)T\tWARN\t'"$1-$2"'\t'"$err"'\n' >> $LOG
        printf "insert into ETL.ErrLog(topic,err) values(\'$1\',\'$err\')" | $CLC 2>> $LOG
        printf "insert into SCH.Offsets select topic,last,rows,next,processor,\'$err\',hostid from SCH.Offsets where topic=\'$1\'" |
                $CLC 2>> $LOG