The current MVP implementation of the scheduler is written on bash and has problems with stability and dependencies (clickhouse-client needed), especially when starting in a containerized environment.
while true ; do
clickhouse-client -q "select * from SCH.LagLive" -f TSVRaw | \
while IFS=$'\t' read -r topic host sql ts version; do
process $topic "$sql" $version $host
done
sleep 1
done
It gets jobs by SQL query and executes the received SQL code asynchronously with a semaphore set by some tag (topic) to prevent simultaneous execution of SQL code for a particular "topic". Clickhouse host (cluster node) to connect to is received together with the query.
the result of SQL code is String or error message.
Normal output should be sent to the text log
in case of errors, they should be processed and stored in text log, Offsets & Logs tables:
printf '%(%Y-%m-%d %H:%M:%S)T\tWARN\t'"$1-$2"'\t'"$err"'\n' >> $LOG
printf "insert into ETL.ErrLog(topic,err) values(\'$1\',\'$err\')" | $CLC 2>> $LOG
printf "insert into SCH.Offsets select topic,last,rows,next,processor,\'$err\',hostid from SCH.Offsets where topic=\'$1\'" |
$CLC 2>> $LOG
The current MVP implementation of the scheduler is written on bash and has problems with stability and dependencies (clickhouse-client needed), especially when starting in a containerized environment.
bash scripts are here - https://github.com/bvt123/SCH/tree/main/bash
Let's rewrite them in Golang to a small compact code and binary.
driver - https://clickhouse.com/docs/en/integrations/go
main loop
It gets jobs by SQL query and executes the received SQL code asynchronously with a semaphore set by some tag (topic) to prevent simultaneous execution of SQL code for a particular "topic". Clickhouse host (cluster node) to connect to is received together with the query.
the result of SQL code is String or error message.
Normal output should be sent to the text log in case of errors, they should be processed and stored in text log, Offsets & Logs tables: