giesselmann / STRique

Nanopore raw signal repeat detection pipeline
MIT License
45 stars 10 forks source link

the strique can't stop #21

Closed renzilin closed 4 years ago

renzilin commented 4 years ago

Dear authors, I'm using the strique to do the repeat quantification. I found that the program can't stop. And still shows like image

And I checked the CPU resource by Ubuntu system monitor, it showed there is a CPU 100% working.

Actually, I met this problem several times. Please give me some advice.

Here is my command used:

#!/bin/bash

#conda activate strique

STRIQUE_PATH="/media/amax/disk1/shared/tools/STRique"

## fofn file

FAST5_DIR=$1
FOFN_PATH=$2
BAM_FILE=$3
OUTPUT=$4

python3 $STRIQUE_PATH/scripts/STRique.py index --recursive $FAST5_DIR --out_prefix $FAST5_DIR > $FOFN_PATH

## repeat quantification

samtools view -F 2308 $BAM_FILE | python3 $STRIQUE_PATH/scripts/STRique.py count --t 12 $FOFN_PATH $STRIQUE_PATH/models/r9_4_450bps.model strique.config > $OUTPUT
renzilin commented 4 years ago

If the screenshot can't be showed, here is the contents in terminal.

/home/amax/anaconda3/envs/repeat_strique/lib/python3.6/site-packages/numpy/core/_methods.py:170: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
/home/amax/anaconda3/envs/repeat_strique/lib/python3.6/site-packages/numpy/core/fromnumeric.py:3373: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
/home/amax/anaconda3/envs/repeat_strique/lib/python3.6/site-packages/numpy/core/_methods.py:170: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
/home/amax/anaconda3/envs/repeat_strique/lib/python3.6/site-packages/numpy/core/fromnumeric.py:3373: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
/home/amax/anaconda3/envs/repeat_strique/lib/python3.6/site-packages/numpy/core/_methods.py:170: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
/home/amax/anaconda3/envs/repeat_strique/lib/python3.6/site-packages/numpy/core/fromnumeric.py:3373: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
/home/amax/anaconda3/envs/repeat_strique/lib/python3.6/site-packages/numpy/core/_methods.py:170: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)

What's more, the fofn file can be generated successfully. I think the problem happened in the count process. I generate the strique config file like:

chr begin   end name    repeat  prefix  suffix
chr1    230905363   230905426   D1S1656 CTAT    GTTTAGCAGCTGTAAGCGCCTGGTCTTTGTTTATTTTTAATTTCCTTTCTTTCCCAATTCTCCTTCAGTCCTGTGTTAGTCAGGATTCTTCAGAGAAATAGAATCACTAGGGAACCAAATATATATACATACAATTAAACACACACACAC  CTACATCACACAGTTGACCCTTGAGCAACACAGGCTTGAACTTATATGGGGATTTTCTTCCATCTCTACCACCCCTGAGACAGCAAGACCAACTCCTCCTCCTCCTTCTCAGCCTACTCAACATGAAGATAATAAGGATGAAGACCTTTA
giesselmann commented 4 years ago

Hey, The warnings in the screenshot are okay, these happen when we can't find the repeat boundary.

Can you tell how many reads you have in the .bam? The counting is not very fast, if it's thousands, it will take a long time on 12 cores. To check if STRique completes do sth. like

samtools view -F 2308 $BAM_FILE | head -100 | python3 $STRIQUE_PATH/scripts/STRique.py count --t 12 $FOFN_PATH $STRIQUE_PATH/models/r9_4_450bps.model strique.config > $OUTPUT

renzilin commented 4 years ago

Hi, I used the command: samtools view -F 2308 barcode16.bam | wc -l It shows there are 293447 reads.

I think that I need to use samtools to remove the reads that don't cover the repeat region.

Thank you for your advice!


From: Pay Gie?elmann notifications@github.com Sent: Saturday, August 22, 2020 4:37 PM To: giesselmann/STRique STRique@noreply.github.com Cc: Zilin zilin.ren@outlook.com; Author author@noreply.github.com Subject: Re: [giesselmann/STRique] the strique can't stop (#21)

Hey, The warnings in the screenshot are okay, these happen when we can't find the repeat boundary.

Can you tell how many reads you have in the .bam? The counting is not very fast, if it's thousands, it will take a long time on 12 cores. To check if STRique completes do sth. like

samtools view -F 2308 $BAM_FILE | head -100 | python3 $STRIQUE_PATH/scripts/STRique.py count --t 12 $FOFN_PATH $STRIQUE_PATH/models/r9_4_450bps.model strique.config > $OUTPUT

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/giesselmann/STRique/issues/21#issuecomment-678614053, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AECXSIDV4NGKCRPDDALHSZLSB57VJANCNFSM4QH53CRA.

renzilin commented 4 years ago

Dear author,

The situation that the strique can't stop again. I ran the strique count on a bam file including 11,509 reads.

> samtools view -F 2308 -L str-loci.bed barcode01.bam chr2 | wc -l 
> 11509

And then I ran the strique count with code:

samtools view -F 2308 -L $BED_FILE $BAM_FILE chr2 | python3 $STRIQUE_PATH/scripts/STRique.py count --t 10 $FOFN_PATH $STRIQUE_PATH/models/r9_4_450bps.model strique.config --out $OUTPUT

2 hours later, the output file has 11,367 lines with wc -l $OUTPUT. And the terminal shows like the screen picture in my first comment in this issue without any new updates infos. According to this two thing, I think the strique has already finished the count. But I don't know why the terminal doesn't show the end and start a new command line.

Please let me know if I made something wrong! Thank you so much!

Bese, Zilin

giesselmann commented 4 years ago

Hey, can you try on a smaller sample first? It looks like most of the reads (11367) make it very quick and a small proportion needs longer. The runtime is growing with the repeat length, if there are some very long repeats, they will block a single process for a long time. Could you please try sth. like this:

samtools view -s 0.01 -F 2308 -L $BED_FILE $BAM_FILE chr2 | python3 $STRIQUE_PATH/scripts/STRique.py count --t 10 $FOFN_PATH $STRIQUE_PATH/models/r9_4_450bps.model strique.config --out $OUTPUT

To use a random 1% sample of the input reads.