Psy-Fer / buttery-eel

The buttery eel - a slow5 guppy/dorado basecaller wrapper
MIT License
34 stars 2 forks source link

Buttery-eel using a different guppy config file than requested #16

Closed Timothy-Amos closed 2 months ago

Timothy-Amos commented 1 year ago

The issue presented as Buttery-eel saying it is doing sup basecalling when it was supposed to be doing hac basecalling for me:

timamo@brenner-fpga:PSXPX230069:$ qstat -j 4204295 | grep -e job_args -e script_file -e cwd
cwd:                        /share/ScratchGeneral/NanoporeClientData/tmp/PSXPX230069/PSXPX230069
job_args:                   PSXPX230069_reads.blow5,dna_r9.4.1_450bps_hac_prom.cfg,9,PSXPX230069_fail_hac.fastq.gz,PSXPX230069_pass_hac.fastq.gz
script_file:                /directflow/KCCGGenometechTemp/projects/timamo/scripts/development/buttery_eel_basecall.sge.sh
timamo@brenner-fpga:PSXPX230069:$ pwd
/share/ScratchGeneral/NanoporeClientData/tmp/PSXPX230069
timamo@brenner-fpga:PSXPX230069:$ grep config PSXPX230069/butter_PSXPX230069.out -m2
 Namespace(input='PSXPX230069_reads.blow5', output='./reads.fastq', guppy_bin=PosixPath('/share/ClusterShare/software/contrib/timamo/ont-guppy-6.5.7/bin'), config='dna_r9.4.1_450bps_sup.cfg', guppy_batchsize=4000, call_mods=False, qscore=10, slow5_threads=4, slow5_batchsize=4000, quiet=False, moves_out=False, do_read_splitting=True, min_score_read_splitting=50.0, detect_adapter=False, min_score_adapter=60.0, trim_adapters=False, detect_mid_strand_adapter=False, log='buttery_guppy_logs', seq_sum=True)
PyGuppyClient(address='ipc:///share/ScratchGeneral/NanoporeClientData/tmp/PSXPX230069/PSXPX230069/5887', config='dna_r9.4.1_450bps_sup', align_ref=None, bed_file=None, barcodes=None, status.connected, )

My script was apparently mapping the arguments correctly:

timamo@brenner-fpga:PSXPX230069:$ grep -e MODEL -e MIN_QSCORE -e CALLMODS -e else -e fi$ -e RAW_INPUT -e OUTPUT /directflow/KCCGGenometechTemp/projects/timamo/scripts/development/buttery_eel_basecall.sge.sh | grep -v '^#' | head -n-2
RAW_INPUT=$1               # raw slow5 files for 1D basecalling with
MODEL=$2                   # basecalling model
MIN_QSCORE=$3              # minimum average quality score for reads in pass reads file
OUTPUT=./reads.fastq
if [ -z $MODEL ]; then echo "MODEL required. eg.: dna_r9.4.1_450bps_hac_prom.cfg"; exit 1; fi
if [ -z $RAW_INPUT ]; then echo "Raw input for basecalling not set"; exit 1; fi
if [[ ${MODEL} == *modbases* ]]; then
    CALLMODS="--call_mods"
                OUTPUT="reads.sam"
else
    CALLMODS=""
fi
/usr/bin/time -v buttery-eel -i ${RAW_INPUT} -o ${OUTPUT} --guppy_bin /share/ClusterShare/software/contrib/timamo/ont-guppy-6.5.7/bin/ --port 5887 --config ${MODEL} -x cuda:all --chunk_size 1500 --max_queued_reads 1000 -q $MIN_QSCORE --do_read_splitting --seq_sum $CALLMODS || log "$RUN_ID Could not execute eel command"

James said the issue was because I had

"another buttery-eel running with diff args"

and

"if they are on the same port, and the same machine, using ipc, the buttery-eel client will connect to the existing server with those models loaded"

and

"please use --use_tcp and use different ports each time".

Hasindu said this code (from his bashrc) can automatically find a port (and can be used in a script):

eel(){
#from https://unix.stackexchange.com/questions/55913/whats-the-easiest-way-to-find-an-unused-local-port
PORT=$(netstat -aln | awk '
  $6 == "LISTEN" {
    if ($4 ~ "[.:][0-9]+$") {
      split($4, a, /[:.]/);
      port = a[length(a)];
      p[port] = 1
    }
  }
  END {
    for (i = 5000; i < 65000 && p[i]; i++){};
    if (i == 65000) {exit 1};
    print i
  }
  ') && source ~/hasindu2008.git/buttery-eel/venv3-multi-guppy-6.5.7-python3.8/bin/activate && buttery-eel -g /install/ont-guppy-6.5.7/bin/ "$@" --port ${PORT} --use_tcp && deactivate
}

Hasindu also said:

"We should better put this fact in the readme and HIGHLIGHT it, as this can be a common mistake by users and is not too good, as it is a silent issue"

Psy-Fer commented 2 months ago

I fixed this already with --port auto