ay-lab / dcHiC

dcHiC: Differential compartment analysis for Hi-C datasets
MIT License
60 stars 10 forks source link

"IndexError: list index out of range" while using --pcatype fithic #64

Open Lucas446 opened 1 year ago

Lucas446 commented 1 year ago


I am running into an error while running dcHIC fithic step:

`Rscript /Users/tlucas/dcHiC/dchicf.r --file input.St10_St14.txt --pcatype fithic --dirovwt T --diffdir catSt10_vs_catSt14 --maxd 10e6 --fithicpath '/Users/tlucas/.pyenv/shims/fithic' --pythonpath '/Users/tlucas/.pyenv/shims/python3'
Finding significant loops from intra sample  St14 St10  replicates
[1] "folder exists"
Fithic file already exists for  NB_St14_20Kb , skipping
[1] "folder exists"
Fithic file already exists for  NB_St10_20Kb , skipping
fithic -i DifferentialResult/catSt10_vs_catSt14/fithic_run/NB_St14_20Kb_fithic/interactions.txt.gz -f DifferentialResult/catSt10_vs_catSt14/fithic_run/NB_St14_20Kb_fithic/fragments.txt.gz -t /Users/tlucas/dcHiC/RESULTS/biases/NB_St14_20Kb.biases.gz -U 10000000 -o DifferentialResult/catSt10_vs_catSt14/fithic_run/NB_St14_20Kb_fithic/fithic_result -r 20000 

Reading fragments file from: DifferentialResult/catSt10_vs_catSt14/fithic_run/NB_St14_20Kb_fithic/fragments.txt.gz
Reading interactions file from: DifferentialResult/catSt10_vs_catSt14/fithic_run/NB_St14_20Kb_fithic/interactions.txt.gz
Output path being used from DifferentialResult/catSt10_vs_catSt14/fithic_run/NB_St14_20Kb_fithic/fithic_result
Fixed size option detected... Fast version of FitHiC will be used
Resolution is 20.0 kb
Reading bias file from: /Users/tlucas/dcHiC/RESULTS/biases/NB_St14_20Kb.biases.gz
The number of spline passes is 1
The number of bins is 100
The number of reads required to consider an interaction is 1
The name of the library for outputted files will be FitHiC
Upper Distance threshold is 10000000
Lower Distance threshold is 0
Only intra-chromosomal regions will be analyzed
Lower bound of bias values is 0.5
Upper bound of bias values is 2
All arguments processed. Running FitHiC now...

Reading the contact counts file to generate bins...
Interactions file read. Time took 2.4002461433410645
Fragments file read. Time took 0.014471769332885742
Traceback (most recent call last):
  File "/Users/tlucas/.pyenv/versions/3.7.3/bin/fithic", line 11, in <module>
    load_entry_point('fithic==2.0.8', 'console_scripts', 'fithic')()
  File "/Users/tlucas/.pyenv/versions/3.7.3/lib/python3.7/site-packages/fithic/fithic.py", line 327, in main
    biasDic = read_biases(biasFile)
  File "/Users/tlucas/.pyenv/versions/3.7.3/lib/python3.7/site-packages/fithic/fithic.py", line 808, in read_biases
    chrom=words[0]; midPoint=int(words[1]); bias=float(words[2])
IndexError: list index out of range
fithic -i DifferentialResult/catSt10_vs_catSt14/fithic_run/NB_St10_20Kb_fithic/interactions.txt.gz -f DifferentialResult/catSt10_vs_catSt14/fithic_run/NB_St10_20Kb_fithic/fragments.txt.gz -t /Users/tlucas/dcHiC/RESULTS/biases/NB_St10_20Kb.biases.gz -U 10000000 -o DifferentialResult/catSt10_vs_catSt14/fithic_run/NB_St10_20Kb_fithic/fithic_result -r 20000 


At some point the index of a list it is trying to access is out of range, do you have any idea where this would be coming from ?

Thanks a lot, Best,

ay-lab commented 1 year ago

It seems like there is an issue with the bias file NB_St14_20Kb.biases.gz. Just want to check if there are any unusual instances in the file, can you share the file with us?

Lucas446 commented 1 year ago

Does are the my biases file I use:

NB_St10_20Kb.biases.gz NB_St14_20Kb.biases.gz

Thank you

ay-lab commented 1 year ago

After looking at the issue more carefully, I found that the error is related to fithic. Please have a look at this issue from fithic repository and try to implement the solution or at least let me know if you're having the empty lines https://github.com/ay-lab/fithic/issues/54

Lucas446 commented 1 year ago


I looked at the ay-lab/fithic/issues/54 and it think my biases file doesn't have the right format

Format from ay-lab/fithic/issues/54 :

zcat fat_5000.fithic.bias.gz|head
NC_052532.1 2500    0.447834
NC_052532.1 7500    0.098977
NC_052532.1 12500   0.150248
NC_052532.1 17500   0.374007
NC_052532.1 22500   0.563625

My biases format (from HiC-Pro ICE output):

gzcat /Users/tlucas/dcHiC/RESULTS/biases/NB_St10_20Kb.biases.gz | head

Do I have to convert the HiC-pro biases format to a fithic format using hicpro2fithic.py ?

Thanks! :)

ay-lab commented 1 year ago


Lucas446 commented 1 year ago

Ok thanks, I have error when using hicpro2fithic.py, I will posted them in hicpro github

ay-lab commented 1 year ago

please post here as well. we developed that script.

ay-lab commented 1 year ago

Please post to the fithic github repository actually.

Lucas446 commented 1 year ago

Here is my fithic error:

python3 fithic -i DifferentialResult/diff_analysis/fithic_run/NB_St14_r1_20Kb_fithic/interactions.txt.gz -f DifferentialResult/diff_analysis/fithic_run/NB_St14_r1_20Kb_fithic/fragments.txt.gz -t /Users/tlucas/dcHiC/RESULTS/biases/NB_St14_r1_20Kb.biases.gz -U 10000000 -o DifferentialResult/diff_analysis/fithic_run/NB_St14_r1_20Kb_fithic/fithic_result -r 20000 
  File "/Users/tlucas/.pyenv/shims/fithic", line 3
    [ -n "$PYENV_DEBUG" ] && set -x
SyntaxError: invalid syntax



dyn-129-236-163-31:RESULTS tlucas$ gzcat /Users/tlucas/dcHiC/RESULTS/biases/NB_St14_r1_20Kb.biases.gz | head
2L  10000   0.7075591964989174

2L  30000   1.2062730976455918

2L  50000   0.9499451013956518

2L  70000   1.216953376795569

2L  90000   1.3021749178375797


dyn-129-236-163-31:RESULTS tlucas$ gzcat /Users/tlucas/dcHiC/RESULTS/DifferentialResult/diff_analysis/fithic_run/NB_St14_r1_20Kb_fithic/fragments.txt.gz | head
2L  0   10000   371 1
2L  0   30000   734 1
2L  0   50000   629 1
2L  0   70000   813 1
2L  0   90000   941 1
2L  0   110000  888 1
2L  0   130000  807 1
2L  0   150000  847 1
2L  0   170000  624 1
2L  0   190000  663 1


dyn-129-236-163-31:RESULTS tlucas$ gzcat /Users/tlucas/dcHiC/RESULTS/DifferentialResult/diff_analysis/fithic_run/NB_St14_r1_20Kb_fithic/interactions.txt.gz | head
2L  10000   2L  10000   48
2L  10000   2L  30000   68
2L  10000   2L  50000   21
2L  10000   2L  70000   17
2L  10000   2L  90000   17
2L  10000   2L  110000  8
2L  10000   2L  130000  7
2L  10000   2L  150000  5
2L  10000   2L  170000  2
2L  10000   2L  190000  2

Thanks a lot!

Lucas446 commented 1 year ago

Ok I managed to fix the syntax error replacing in dcHIC.r script line 1495 python3 path by "bash"

previous: cmd <- paste0(python_path," ",fithic_path," -i ",folder,"/interactions.txt.gz -f ",folder,"/fragments.txt.gz -t ",bias," -U ",as.integer(u)," -o ",folder,"/fithic_result -r ",as.integer(resolution))

fix: cmd <- paste0("bash"," ",fithic_path," -i ",folder,"/interactions.txt.gz -f ",folder,"/fragments.txt.gz -t ",bias," -U ",as.integer(u)," -o ",folder,"/fithic_result -r ",as.integer(resolution))

Now I am still have the out of range issue even using the output of hicpro2fithic

bash /Users/tlucas/.pyenv/shims/fithic -i DifferentialResult/diff_analysis/fithic_run/NB_esc_20Kb_fithic/interactions.txt.gz -f DifferentialResult/diff_analysis/fithic_run/NB_esc_20Kb_fithic/fragments.txt.gz -t /Users/tlucas/dcHiC/RESULTS/biases/NB_esc_20Kb.biases.gz -U 10000000 -o DifferentialResult/diff_analysis/fithic_run/NB_esc_20Kb_fithic/fithic_result -r 20000 

Reading fragments file from: DifferentialResult/diff_analysis/fithic_run/NB_esc_20Kb_fithic/fragments.txt.gz
Reading interactions file from: DifferentialResult/diff_analysis/fithic_run/NB_esc_20Kb_fithic/interactions.txt.gz
Output path created DifferentialResult/diff_analysis/fithic_run/NB_esc_20Kb_fithic/fithic_result
Fixed size option detected... Fast version of FitHiC will be used
Resolution is 20.0 kb
Reading bias file from: /Users/tlucas/dcHiC/RESULTS/biases/NB_esc_20Kb.biases.gz
The number of spline passes is 1
The number of bins is 100
The number of reads required to consider an interaction is 1
The name of the library for outputted files will be FitHiC
Upper Distance threshold is 10000000
Lower Distance threshold is 0
Only intra-chromosomal regions will be analyzed
Lower bound of bias values is 0.5
Upper bound of bias values is 2
All arguments processed. Running FitHiC now...

Reading the contact counts file to generate bins...
Interactions file read. Time took 4.374690055847168
Fragments file read. Time took 0.01379704475402832
Traceback (most recent call last):
  File "/Users/tlucas/.pyenv/versions/3.7.3/bin/fithic", line 11, in <module>
    load_entry_point('fithic==2.0.8', 'console_scripts', 'fithic')()
  File "/Users/tlucas/.pyenv/versions/3.7.3/lib/python3.7/site-packages/fithic/fithic.py", line 327, in main
    biasDic = read_biases(biasFile)
  File "/Users/tlucas/.pyenv/versions/3.7.3/lib/python3.7/site-packages/fithic/fithic.py", line 808, in read_biases
    chrom=words[0]; midPoint=int(words[1]); bias=float(words[2])
IndexError: list index out of range
[1] 1
Taking input= as a system command ('gzip -dc DifferentialResult/diff_analysis/fithic_run/NB_St14_r1_20Kb_fithic/fithic_result/FitHiC.spline_pass1.res20000.significances.txt.gz') and a variable has been used in the expression passed to `input=`. Please use fread(cmd=...). There is a security concern if you are creating an app, and the app could have a malicious user, and the app is not running in a secure environment; e.g. the app is running as root. Please read item 5 in the NEWS file for v1.11.6 for more information and for the option to suppress this message.
gzip: can't stat: DifferentialResult/diff_analysis/fithic_run/NB_St14_r1_20Kb_fithic/fithic_result/FitHiC.spline_pass1.res20000.significances.txt.gz (DifferentialResult/diff_analysis/fithic_run/NB_St14_r1_20Kb_fithic/fithic_result/FitHiC.spline_pass1.res20000.significances.txt.gz.gz): No such file or directory
Error in setnames(x, value) : 
  Can't assign 7 names to a 0 column data.table
Calls: fithicformat ... colnames<- -> names<- -> names<-.data.table -> setnames
In addition: Warning message:
In data.table::fread(paste0("gzip -dc ", diffdir, "/fithic_run/",  :
  File '/var/folders/0h/h1zqy6251n1dq_nw_050nmdc0000gn/T//RtmpuHmiEu/filee5fa48801a2c' has size 0. Returning a NULL data.table.
Execution halted
abhijitcbio commented 1 year ago

I see you posted this issue in the fithic repository too. I will wait for their comments.