ay-lab / dcHiC

dcHiC: Differential compartment analysis for Hi-C datasets
MIT License
60 stars 10 forks source link

"IndexError: list index out of range" while using --pcatype fithic #64

Open Lucas446 opened 1 year ago

Lucas446 commented 1 year ago

Hi,

I am running into an error while running dcHIC fithic step:

`Rscript /Users/tlucas/dcHiC/dchicf.r --file input.St10_St14.txt --pcatype fithic --dirovwt T --diffdir catSt10_vs_catSt14 --maxd 10e6 --fithicpath '/Users/tlucas/.pyenv/shims/fithic' --pythonpath '/Users/tlucas/.pyenv/shims/python3'
Finding significant loops from intra sample  St14 St10  replicates
[1] "folder exists"
Fithic file already exists for  NB_St14_20Kb , skipping
[1] "folder exists"
Fithic file already exists for  NB_St10_20Kb , skipping
fithic -i DifferentialResult/catSt10_vs_catSt14/fithic_run/NB_St14_20Kb_fithic/interactions.txt.gz -f DifferentialResult/catSt10_vs_catSt14/fithic_run/NB_St14_20Kb_fithic/fragments.txt.gz -t /Users/tlucas/dcHiC/RESULTS/biases/NB_St14_20Kb.biases.gz -U 10000000 -o DifferentialResult/catSt10_vs_catSt14/fithic_run/NB_St14_20Kb_fithic/fithic_result -r 20000 

GIVEN FIT-HI-C ARGUMENTS
=========================
Reading fragments file from: DifferentialResult/catSt10_vs_catSt14/fithic_run/NB_St14_20Kb_fithic/fragments.txt.gz
Reading interactions file from: DifferentialResult/catSt10_vs_catSt14/fithic_run/NB_St14_20Kb_fithic/interactions.txt.gz
Output path being used from DifferentialResult/catSt10_vs_catSt14/fithic_run/NB_St14_20Kb_fithic/fithic_result
Fixed size option detected... Fast version of FitHiC will be used
Resolution is 20.0 kb
Reading bias file from: /Users/tlucas/dcHiC/RESULTS/biases/NB_St14_20Kb.biases.gz
The number of spline passes is 1
The number of bins is 100
The number of reads required to consider an interaction is 1
The name of the library for outputted files will be FitHiC
Upper Distance threshold is 10000000
Lower Distance threshold is 0
Only intra-chromosomal regions will be analyzed
Lower bound of bias values is 0.5
Upper bound of bias values is 2
All arguments processed. Running FitHiC now...
=========================

Reading the contact counts file to generate bins...
Interactions file read. Time took 2.4002461433410645
Fragments file read. Time took 0.014471769332885742
Traceback (most recent call last):
  File "/Users/tlucas/.pyenv/versions/3.7.3/bin/fithic", line 11, in <module>
    load_entry_point('fithic==2.0.8', 'console_scripts', 'fithic')()
  File "/Users/tlucas/.pyenv/versions/3.7.3/lib/python3.7/site-packages/fithic/fithic.py", line 327, in main
    biasDic = read_biases(biasFile)
  File "/Users/tlucas/.pyenv/versions/3.7.3/lib/python3.7/site-packages/fithic/fithic.py", line 808, in read_biases
    chrom=words[0]; midPoint=int(words[1]); bias=float(words[2])
IndexError: list index out of range
fithic -i DifferentialResult/catSt10_vs_catSt14/fithic_run/NB_St10_20Kb_fithic/interactions.txt.gz -f DifferentialResult/catSt10_vs_catSt14/fithic_run/NB_St10_20Kb_fithic/fragments.txt.gz -t /Users/tlucas/dcHiC/RESULTS/biases/NB_St10_20Kb.biases.gz -U 10000000 -o DifferentialResult/catSt10_vs_catSt14/fithic_run/NB_St10_20Kb_fithic/fithic_result -r 20000 

` 

At some point the index of a list it is trying to access is out of range, do you have any idea where this would be coming from ?

Thanks a lot, Best,

ay-lab commented 1 year ago

It seems like there is an issue with the bias file NB_St14_20Kb.biases.gz. Just want to check if there are any unusual instances in the file, can you share the file with us?

Lucas446 commented 1 year ago

Does are the my biases file I use:

NB_St10_20Kb.biases.gz NB_St14_20Kb.biases.gz

Thank you

ay-lab commented 1 year ago

After looking at the issue more carefully, I found that the error is related to fithic. Please have a look at this issue from fithic repository and try to implement the solution or at least let me know if you're having the empty lines https://github.com/ay-lab/fithic/issues/54

Lucas446 commented 1 year ago

Hi,

I looked at the ay-lab/fithic/issues/54 and it think my biases file doesn't have the right format

Format from ay-lab/fithic/issues/54 :

zcat fat_5000.fithic.bias.gz|head
NC_052532.1 2500    0.447834
NC_052532.1 7500    0.098977
NC_052532.1 12500   0.150248
NC_052532.1 17500   0.374007
NC_052532.1 22500   0.563625

My biases format (from HiC-Pro ICE output):

gzcat /Users/tlucas/dcHiC/RESULTS/biases/NB_St10_20Kb.biases.gz | head
7.779782869173684778e-01
1.727070717027215929e+00
9.843715269101076526e-01
1.428104165677709148e+00
1.546911243345023834e+00

Do I have to convert the HiC-pro biases format to a fithic format using hicpro2fithic.py ?

Thanks! :)

ay-lab commented 1 year ago

Yes!

Lucas446 commented 1 year ago

Ok thanks, I have error when using hicpro2fithic.py, I will posted them in hicpro github

ay-lab commented 1 year ago

please post here as well. we developed that script.

ay-lab commented 1 year ago

Please post to the fithic github repository actually.

Lucas446 commented 1 year ago

Here is my fithic error:

python3 fithic -i DifferentialResult/diff_analysis/fithic_run/NB_St14_r1_20Kb_fithic/interactions.txt.gz -f DifferentialResult/diff_analysis/fithic_run/NB_St14_r1_20Kb_fithic/fragments.txt.gz -t /Users/tlucas/dcHiC/RESULTS/biases/NB_St14_r1_20Kb.biases.gz -U 10000000 -o DifferentialResult/diff_analysis/fithic_run/NB_St14_r1_20Kb_fithic/fithic_result -r 20000 
  File "/Users/tlucas/.pyenv/shims/fithic", line 3
    [ -n "$PYENV_DEBUG" ] && set -x
                      ^
SyntaxError: invalid syntax

inputs:

biase

dyn-129-236-163-31:RESULTS tlucas$ gzcat /Users/tlucas/dcHiC/RESULTS/biases/NB_St14_r1_20Kb.biases.gz | head
2L  10000   0.7075591964989174

2L  30000   1.2062730976455918

2L  50000   0.9499451013956518

2L  70000   1.216953376795569

2L  90000   1.3021749178375797

fragments

dyn-129-236-163-31:RESULTS tlucas$ gzcat /Users/tlucas/dcHiC/RESULTS/DifferentialResult/diff_analysis/fithic_run/NB_St14_r1_20Kb_fithic/fragments.txt.gz | head
2L  0   10000   371 1
2L  0   30000   734 1
2L  0   50000   629 1
2L  0   70000   813 1
2L  0   90000   941 1
2L  0   110000  888 1
2L  0   130000  807 1
2L  0   150000  847 1
2L  0   170000  624 1
2L  0   190000  663 1

interactions

dyn-129-236-163-31:RESULTS tlucas$ gzcat /Users/tlucas/dcHiC/RESULTS/DifferentialResult/diff_analysis/fithic_run/NB_St14_r1_20Kb_fithic/interactions.txt.gz | head
2L  10000   2L  10000   48
2L  10000   2L  30000   68
2L  10000   2L  50000   21
2L  10000   2L  70000   17
2L  10000   2L  90000   17
2L  10000   2L  110000  8
2L  10000   2L  130000  7
2L  10000   2L  150000  5
2L  10000   2L  170000  2
2L  10000   2L  190000  2

Thanks a lot!

Lucas446 commented 1 year ago

Ok I managed to fix the syntax error replacing in dcHIC.r script line 1495 python3 path by "bash"

previous: cmd <- paste0(python_path," ",fithic_path," -i ",folder,"/interactions.txt.gz -f ",folder,"/fragments.txt.gz -t ",bias," -U ",as.integer(u)," -o ",folder,"/fithic_result -r ",as.integer(resolution))

fix: cmd <- paste0("bash"," ",fithic_path," -i ",folder,"/interactions.txt.gz -f ",folder,"/fragments.txt.gz -t ",bias," -U ",as.integer(u)," -o ",folder,"/fithic_result -r ",as.integer(resolution))

Now I am still have the out of range issue even using the output of hicpro2fithic

bash /Users/tlucas/.pyenv/shims/fithic -i DifferentialResult/diff_analysis/fithic_run/NB_esc_20Kb_fithic/interactions.txt.gz -f DifferentialResult/diff_analysis/fithic_run/NB_esc_20Kb_fithic/fragments.txt.gz -t /Users/tlucas/dcHiC/RESULTS/biases/NB_esc_20Kb.biases.gz -U 10000000 -o DifferentialResult/diff_analysis/fithic_run/NB_esc_20Kb_fithic/fithic_result -r 20000 

GIVEN FIT-HI-C ARGUMENTS
=========================
Reading fragments file from: DifferentialResult/diff_analysis/fithic_run/NB_esc_20Kb_fithic/fragments.txt.gz
Reading interactions file from: DifferentialResult/diff_analysis/fithic_run/NB_esc_20Kb_fithic/interactions.txt.gz
Output path created DifferentialResult/diff_analysis/fithic_run/NB_esc_20Kb_fithic/fithic_result
Fixed size option detected... Fast version of FitHiC will be used
Resolution is 20.0 kb
Reading bias file from: /Users/tlucas/dcHiC/RESULTS/biases/NB_esc_20Kb.biases.gz
The number of spline passes is 1
The number of bins is 100
The number of reads required to consider an interaction is 1
The name of the library for outputted files will be FitHiC
Upper Distance threshold is 10000000
Lower Distance threshold is 0
Only intra-chromosomal regions will be analyzed
Lower bound of bias values is 0.5
Upper bound of bias values is 2
All arguments processed. Running FitHiC now...
=========================

Reading the contact counts file to generate bins...
Interactions file read. Time took 4.374690055847168
Fragments file read. Time took 0.01379704475402832
Traceback (most recent call last):
  File "/Users/tlucas/.pyenv/versions/3.7.3/bin/fithic", line 11, in <module>
    load_entry_point('fithic==2.0.8', 'console_scripts', 'fithic')()
  File "/Users/tlucas/.pyenv/versions/3.7.3/lib/python3.7/site-packages/fithic/fithic.py", line 327, in main
    biasDic = read_biases(biasFile)
  File "/Users/tlucas/.pyenv/versions/3.7.3/lib/python3.7/site-packages/fithic/fithic.py", line 808, in read_biases
    chrom=words[0]; midPoint=int(words[1]); bias=float(words[2])
IndexError: list index out of range
[1] 1
Taking input= as a system command ('gzip -dc DifferentialResult/diff_analysis/fithic_run/NB_St14_r1_20Kb_fithic/fithic_result/FitHiC.spline_pass1.res20000.significances.txt.gz') and a variable has been used in the expression passed to `input=`. Please use fread(cmd=...). There is a security concern if you are creating an app, and the app could have a malicious user, and the app is not running in a secure environment; e.g. the app is running as root. Please read item 5 in the NEWS file for v1.11.6 for more information and for the option to suppress this message.
gzip: can't stat: DifferentialResult/diff_analysis/fithic_run/NB_St14_r1_20Kb_fithic/fithic_result/FitHiC.spline_pass1.res20000.significances.txt.gz (DifferentialResult/diff_analysis/fithic_run/NB_St14_r1_20Kb_fithic/fithic_result/FitHiC.spline_pass1.res20000.significances.txt.gz.gz): No such file or directory
Error in setnames(x, value) : 
  Can't assign 7 names to a 0 column data.table
Calls: fithicformat ... colnames<- -> names<- -> names<-.data.table -> setnames
In addition: Warning message:
In data.table::fread(paste0("gzip -dc ", diffdir, "/fithic_run/",  :
  File '/var/folders/0h/h1zqy6251n1dq_nw_050nmdc0000gn/T//RtmpuHmiEu/filee5fa48801a2c' has size 0. Returning a NULL data.table.
Execution halted
abhijitcbio commented 1 year ago

I see you posted this issue in the fithic repository too. I will wait for their comments.