giesselmann / STRique

Nanopore raw signal repeat detection pipeline
MIT License
43 stars 10 forks source link

ValueError: RepeatCounter: Target with name (21.13723577) already defined #25

Closed ligiamateiu closed 3 years ago

ligiamateiu commented 3 years ago

Hey, This error is reoccuring. Is there something I should be worried about?

Best, ligia

05.02.2021 15:59:42 [PID 988702] [WARNING] Factory: Unexpected error in Worker, proceeding wiht remaining reads. Traceback (most recent call last):

File "STRique.py", line 757, in worker input = worker_callable(**input)

File "STRique.py", line 683, in detect self.__init_hmm__()

File "STRique.py", line 644, in __init_hmm__ self.repeatCounter.add_target(target_name, repeat, prefix, suffix)

File "STRique.py", line 579, in add_target raise ValueError("RepeatCounter: Target with name " + str(target_name) + " already defined.")

ValueError: RepeatCounter: Target with name 21.13723577 already defined.

giesselmann commented 3 years ago

Hey, yes, it indicates duplicated lines in you repeat config. I wonder how a target name can be a real number, can you post the config? Pay

ligiamateiu commented 3 years ago

yes, the "name" column is a real number, a unique position value. Ligia

giesselmann commented 3 years ago

Okay, than the warning means you have at least two rows with identical names in your config. Only the first one will be parsed and reported in the output. If that is intended, you can safely ignore the warning. Pay

ligiamateiu commented 3 years ago

ok, thanks!! Ligia

wjyzidane commented 3 years ago

Hi Pay, I think I have the same error but

1) I checked my "name" column in config.tsv and there is no duplication 2) the output is empty. The output is not empty if I head first 10 lines from config.tsv to run.

Do you have any hints? Thanks!

###################### The Error is as below: 16.04.2021 21:31:33 [PID 11265] [WARNING] Factory: Unexpected error in Worker, proceeding wiht remaining reads. Traceback (most recent call last):

File "/home/unix/jwu/software/STRique_install/STRique/scripts/STRique.py", line 757, in worker input = worker_callable(**input)

File "/home/unix/jwu/software/STRique_install/STRique/scripts/STRique.py", line 683, in detect self.__init_hmm__()

File "/home/unix/jwu/software/STRique_install/STRique/scripts/STRique.py", line 644, in __init_hmm__ self.repeatCounter.add_target(target_name, repeat, prefix, suffix)

File "/home/unix/jwu/software/STRique_install/STRique/scripts/STRique.py", line 579, in add_target raise ValueError("RepeatCounter: Target with name " + str(target_name) + " already defined.")

ValueError: RepeatCounter: Target with name chr4_134079696_134079726_GA already defined.

giesselmann commented 3 years ago

Hi, can you share the full config file? Per mail to giesselmann[at]molgen.mpg.de, if you don't want to make it public. Pay

wjyzidane commented 3 years ago

Here is my config file. Thank you!

test_config.txt

giesselmann commented 3 years ago

Hi, I can't reproduce the error, I had to filter out a couple of lines containing N's (for these you can't build an HMM and would receive an error later):

cat test_config.txt | grep -v N > test_config_no_N.txt

and ran your config on the STRique test data:

cat src/STRique/data/c9orf72.sam | python3 src/STRique/scripts/STRique.py count src/STRique/data/reads.fofn src/STRique/models/r9_4_450bps.model test_config_no_N.txt

The output is empty, which makes sense since the c9orf72 repeat is not in your list. The entry 'chr4_134079696_134079726_GA' is unique, and not causing any issues. Can you double-check the call with the file you send me?

Also two notes about the config: You have a couple of entries with length-1 repeats (homopolymers) these won't yield accurate quantification since the signal for these is expected to be constant and just different in length. Second, the init takes ages, each worker process of STRique will need to build ~3.4k counting HMMs, if possible, I would filter to fewer repeats of particular interest. Pay

wjyzidane commented 3 years ago

Thank you.

When I run it directly, it still reports the same error. When I remove the "N" and repeats with length <=2 (including chr4_134079696_134079726_GA) and it starts running without error. But it takes so long so I kill it.

So I think either N or repeats with length <=2 cause this problem.