ay-lab / fithic

Fit-Hi-C is a tool for assigning statistical confidence estimates to chromosomal contact maps produced by genome-wide genome architecture assays such as Hi-C.
MIT License
77 stars 16 forks source link

error while running FitHiC #38

Open priyatamapandey opened 3 years ago

priyatamapandey commented 3 years ago

Hi, I am running version Fit-Hi-C 2.0.7. I used the command below fithic -f /project/roselai_228/priyatap/HiC_work/fithic_output/fithic.fragmentMappability.gz -i /project/roselai_228/priyatap/HiC_work/fithic_output/fithic.interactionCounts.gz -U 1000000 -r 1000000 -l fithit_SRR1030745_1MB -o /project/roselai_228/priyatap/HiC_work/fithic_output

and encounter the error

Screen Shot 2020-12-11 at 7 26 34 PM

Please help me, how to resolve the issue. FYI, I used the below function to generate the input for fithic. I got error in this command too but it generated three zip files. SO, I used 2 files in the above code.

python HiCPro2FitHiC.py -i /project/roselai_228/priyatap/HiC_work/output/hic_results/matrix/SRR1030745/raw/1000000/SRR1030745_1000000.matrix -b /project/roselai_228/priyatap/HiC_work/output/hic_results/matrix/SRR1030745/raw/1000000/SRR1030745_1000000_abs.bed -s /project/roselai_228/priyatap/HiC_work/output/hic_results/matrix/SRR1030745/iced/1000000/SRR1030745_1000000_iced.matrix.biases -o /scratch/priyatap/Hicpro_output

The error I got from the HiCPro2FitHiC.py is below

Screen Shot 2020-12-11 at 6 29 45 PM

Please help me out. Thank you in advance for your help, Priya

ay-lab commented 3 years ago

Hi. Can you send us the three files you used as input to HiCPro2FitHiC.py so that we can debug? Thank you

priyatamapandey commented 3 years ago

Hi, Thank you for your quick response. Attached, please find the input files. troubleshooting.zip

I have never sent this big file on GitHub. Please let me know how to send it if this compressed file does not reach you.

Thank you, Priya

ay-lab commented 3 years ago

Not sure why but your _abs.bed file has one less row than the biases file which seems to cause the error. You may want to ask why this would happen to the HiCPro group.

priyatamapandey commented 3 years ago

Thank you for checking it. I have already asked in that group too. Although, I have not received any error while running the HicPro. All the expected output files have generated.

Thanks, Priya

ay-lab commented 3 years ago

If you want a quick solution, remove the last line in the biases file and remove all entries with index 3114 out of the interactions file and then try using HiCPro2FitHiC.py again

priyatamapandey commented 3 years ago

Hi, Thank you for suggestion. I have tried hicpro with older version and that seems to be working fine and after getting files using hicpro2fithic.py, I ran FitHiC. I got the an error. I am pasting my command an error here

> [priyatap@e21-07 fithic_output]$ module load gcc/8.3.0 
> [priyatap@e21-07 fithic_output]$ module load python/3.7.6
> [priyatap@e21-07 fithic_output]$ fithic -f /project/roselai_228/priyatap/HiC_work/convert_hicpro2fithic/fithic.fragmentMappability.gz -i /project/roselai_228/priyatap/HiC_work/convert_hicpro2fithic/fithic.interactionCounts.gz -U 1000000 -r 1000000 -l fithit_SRR1030745_1MB -o /project/roselai_228/priyatap/HiC_work/fithic_output_new

_>

GIVEN FIT-HI-C ARGUMENTS

Reading fragments file from: /project/roselai_228/priyatap/HiC_work/convert_hicpro2fithic/fithic.fragmentMappability.gz Reading interactions file from: /project/roselai_228/priyatap/HiC_work/convert_hicpro2fithic/fithic.interactionCounts.gz Output path being used from /project/roselai_228/priyatap/HiC_work/fithic_output_new Fixed size option detected... Fast version of FitHiC will be used Resolution is 1000.0 kb No bias file The number of spline passes is 1 The number of bins is 100 The number of reads required to consider an interaction is 1 The name of the library for outputted files will be fithit_SRR10307451MB Upper Distance threshold is 1000000 Lower Distance threshold is 0 Only intra-chromosomal regions will be analyzed Lower bound of bias values is 0.5 Upper bound of bias values is 2 All arguments processed. Running FitHiC now... =========================

Reading the contact counts file to generate bins... Interactions file read. Time took 11.18812608718872 Fragments file read. Time took 0.0322415828704834 Writing /project/roselai_228/priyatap/HiC_work/fithic_output_new/fithit_SRR1030745_1MB.fithic_pass1.res1000000.txt Spline fit Pass 1 starting... Traceback (most recent call last): File "/home1/priyatap/.local/bin/fithic", line 10, in sys.exit(main()) File "/home1/priyatap/.local/lib/python3.7/site-packages/fithic/fithic.py", line 326, in main splineXinit,splineYinit,residual,outliersline, outliersdist, FDRXinit, FDRYinit= fit_Spline(mainDic,x,y,yerr,contactCountsFile,os.path.join(outputPath,libName+".spline_pass1"),biasDic, outliersline, outliersdist, observedIntraInRangeSum, possibleIntraInRangeCount, possibleInterAllCount, observedIntraAllSum, observedInterAllSum, resolution, 1) File "/home1/priyatap/.local/lib/python3.7/site-packages/fithic/fithic.py", line 801, in fit_Spline ius = UnivariateSpline(x, y, s=splineError) File "/spack/apps/linux-centos7-x86_64/gcc-8.3.0/python-3.7.6-dd2am3dyvlpovhd4rizwfzc45wnsajxf/lib/python3.7/site-packages/scipy/interpolate/fitpack2.py", line 191, in init xe=bbox[1], s=s) dfitpack.error: (m>k) failed for hidden m: fpcurf0:m=1

Could you please help me to figure out what is causing this error?

Thank you, Priya

ay-lab commented 3 years ago

Your resolution equals your distance upper bound. Not sure how you are expecting to get a meaningful result with this. If you want to analyze the data at 1Mb then the -U should at least be 50 million or more and your -L should be 2 million.

On Fri, Dec 18, 2020 at 8:36 AM Priyatama Pandey notifications@github.com wrote:

Hi, Thank you for suggestion. I have tried hicpro with older version and that seems to be working fine and after getting files using hicpro2fithic.py, I ran FitHiC. I got the an error. I am pasting my command an error here

[priyatap@e21-07 fithic_output]$ module load gcc/8.3.0 [priyatap@e21-07 fithic_output]$ module load python/3.7.6 [priyatap@e21-07 fithic_output]$ fithic -f /project/roselai_228/priyatap/HiC_work/convert_hicpro2fithic/fithic.fragmentMappability.gz -i /project/roselai_228/priyatap/HiC_work/convert_hicpro2fithic/fithic.interactionCounts.gz -U 1000000 -r 1000000 -l fithit_SRR1030745_1MB -o /project/roselai_228/priyatap/HiC_work/fithic_output_new

_>

GIVEN FIT-HI-C ARGUMENTS

Reading fragments file from: /project/roselai_228/priyatap/HiC_work/convert_hicpro2fithic/fithic.fragmentMappability.gz Reading interactions file from: /project/roselai_228/priyatap/HiC_work/convert_hicpro2fithic/fithic.interactionCounts.gz Output path being used from /project/roselai_228/priyatap/HiC_work/fithic_output_new Fixed size option detected... Fast version of FitHiC will be used Resolution is 1000.0 kb No bias file The number of spline passes is 1 The number of bins is 100 The number of reads required to consider an interaction is 1 The name of the library for outputted files will be fithit_SRR10307451MB Upper Distance threshold is 1000000 Lower Distance threshold is 0 Only intra-chromosomal regions will be analyzed Lower bound of bias values is 0.5 Upper bound of bias values is 2 All arguments processed. Running FitHiC now... =========================

Reading the contact counts file to generate bins... Interactions file read. Time took 11.18812608718872 Fragments file read. Time took 0.0322415828704834 Writing /project/roselai_228/priyatap/HiC_work/fithic_output_new/fithit_SRR1030745_1MB.fithic_pass1.res1000000.txt Spline fit Pass 1 starting... Traceback (most recent call last): File "/home1/priyatap/.local/bin/fithic", line 10, in sys.exit(main()) File "/home1/priyatap/.local/lib/python3.7/site-packages/fithic/fithic.py", line 326, in main splineXinit,splineYinit,residual,outliersline, outliersdist, FDRXinit, FDRYinit= fit_Spline(mainDic,x,y,yerr,contactCountsFile,os.path.join(outputPath,libName+".spline_pass1"),biasDic, outliersline, outliersdist, observedIntraInRangeSum, possibleIntraInRangeCount, possibleInterAllCount, observedIntraAllSum, observedInterAllSum, resolution, 1) File "/home1/priyatap/.local/lib/python3.7/site-packages/fithic/fithic.py", line 801, in fit_Spline ius = UnivariateSpline(x, y, s=splineError) File "/spack/apps/linux-centos7-x86_64/gcc-8.3.0/python-3.7.6-dd2am3dyvlpovhd4rizwfzc45wnsajxf/lib/python3.7/site-packages/scipy/interpolate/fitpack2.py", line 191, in init xe=bbox[1], s=s) dfitpack.error: (m>k) failed for hidden m: fpcurf0:m=1

Could you please help me to figure out what is causing this error?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ay-lab/fithic/issues/38#issuecomment-748193079, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEJTNJBNS6MKHF63O5AT4IDSVOAG7ANCNFSM4UXUJMYQ .

priyatamapandey commented 3 years ago

Thank you for your suggestion. I am completely naive in this field. I am learning it. I am trying to run it successfully first and understand the cutoff. Is this the reason of error?

Is the data of 1MB means 500KB in each direction from the interaction point ?

Thank you,

ay-lab commented 3 years ago

These are just fixed-size bins of whatever resolution you choose. There may not be any "interaction point" in them or they may be several. The binning is not with respect to any specific point, just non-overrlapping bins 0 - 1,000,000bp = bin1 1,000,000bp - 2,000,000bp = bin2

I HIGHLY suggest that you FULLY read a decent review paper about Hi-C data and its analysis before you go any further with your analysis or interests in this field. Here is one we wrote a while ago: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0745-7

On Fri, Dec 18, 2020 at 9:09 AM Priyatama Pandey notifications@github.com wrote:

Thank you for your suggestion. I am completely naive in this field. I am learning it. I am trying to run it successfully first and understand the cutoff. Is this the reason of error?

Is the data of 1MB means 500KB in each direction from the interaction point ?

Thank you,

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ay-lab/fithic/issues/38#issuecomment-748210038, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEJTNJFG4UVVM7LKNYTQJCDSVOEFNANCNFSM4UXUJMYQ .

priyatamapandey commented 3 years ago

Cool! Thanks you. Yes, I came across this paper once. I will read it completely again. So, should I also provide bin size if I am providing -U or I can keep it at default. I am trying to understand, upper and lower threshold option, why 1Mb of data, -U should at least be 50 million or more and -L should be 2 million. In the case of 500Kbof data , -L should be 1million.

Thanks for bearing me, Priya

priyatamapandey commented 3 years ago

Hi,

I used HUGin web tool (http://hugin2.genetics.unc.edu/Project) which give me fithic results in the below format. I am pasting example here.

Screen Shot 2021-01-11 at 6 12 43 PM

I want to specific with the anchor/SNP for example position chr5:53298024 was used as a anchor in the above table.

Wherever my results looks like below

Screen Shot 2021-01-11 at 6 15 24 PM

Please let me know how can I generate results like the first screenshot. Is there any option in fithic to specify the anchor or genomic position?

Thank you, Priya

ay-lab commented 3 years ago

The most recent fithic2 code should be outputting the expected count as well. Make sure you are using the latest version. You can use awk or even excel if you have to after that to convert one output format to the other.

https://github.com/ay-lab/fithic/blob/master/fithic/fithic.py .. 1178: outfile.write("chr1\tfragmentMid1\tchr2\tfragmentMid2\tcontactCount\t p-value\tq-value\tbias1\tbias2\tExpCC\n") ..

On Mon, Jan 11, 2021 at 6:18 PM Priyatama Pandey notifications@github.com wrote:

Hi,

I used HUGin web tool (http://hugin2.genetics.unc.edu/Project) which give me fithic results in the below format. I am pasting example here. [image: Screen Shot 2021-01-11 at 6 12 43 PM] https://user-images.githubusercontent.com/17990280/104260569-9f278080-5438-11eb-9175-2c68a28fbd1b.png

Wherever my results looks like below [image: Screen Shot 2021-01-11 at 6 15 24 PM] https://user-images.githubusercontent.com/17990280/104260813-0cd3ac80-5439-11eb-92f9-b3669de5691e.png

Please let me know how can I generate results like the first screenshot.

Thank you, Priya

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ay-lab/fithic/issues/38#issuecomment-758349172, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEJTNJBPNP2WLBZZKUGFQQDSZOWQRANCNFSM4UXUJMYQ .

priyatamapandey commented 3 years ago

Hi, My fithic version is Fit-Hi-C 2.0.7. I have recently installed it. Is fithic2 has different GitHub link or source code? My main concern is, if you see the first table in previous comment, query start and end are fixed position along the entire column but oe start and oe end are moving window of 40KB. I don't see this kind of pattern in my output result. How could I get query start, query end and moving window of 40 KB (oe start and oe end) format from my fithic result? I want to find the significant interaction around a locus/ SNP position giving 40KB window like in the first table. Another thing, is there any way to limit search result around a SNP position with few KB window instead of entire genome? Please suggest. Priya

ay-lab commented 3 years ago

If m is your midpoint then m-20k is your query start and m+20k is your query end. For a given region you can query using awk but need to be careful as your region can be on the left or right. For instance: awk '($1=="chr2" && $2==560000) || ($3=="chr2" && $4==560000)' will give you the entries for that specific region then you can use other awk commands to process further.

On Mon, Jan 11, 2021 at 11:09 PM Priyatama Pandey notifications@github.com wrote:

Hi, My fithic version is Fit-Hi-C 2.0.7. I have recently installed it. Is fithic2 has different GitHub link or source code? My main concern is, if you see the first table in previous comment, query start and end are fixed position along the entire column but oe start and oe end are moving window of 40KB. I don't see this kind of pattern in my output result. How could I get query start, query end and moving window of 40 KB (oe start and oe end) format from my fithic result? I want to find the significant interaction around a locus/ SNP position giving 40KB window like in the first table. Please suggest. Priya

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ay-lab/fithic/issues/38#issuecomment-758454877, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEJTNJAQYX7MKSGYOGH5FT3SZPYRPANCNFSM4UXUJMYQ .

priyatamapandey commented 3 years ago

Hi, I understand now that how to change fragment midpoint in window. So, if my fragmentMid1 is (m-20KB, m+20KB) then I am looking for fragmentMid2 (col4) (m+20KB,m+40Kb), (m+40KB,m+60Kb),.... and Pvalue for those.
I don't see this pattern in my result screenshot attached. For example, I am expecting 920000, 960000, 1000000..etc.. in my second, third , fourth line of column4 , an increase of 40KB as I did it for 40KB resolution. To make it simple to understand chr fragment1_window chr fragment2_window pvalue chr (m-20kb, m+20kb) chr (m+20kb,m+60kb) .0x chr (m-20kb, m+20kb) chr (m+60kb,m+100kb) .0y chr (m-20kb, m+20kb) chr (m+100kb,m+140kb) .0z etc..

Screen Shot 2021-01-11 at 11 42 56 PM

Thanks so much, Priya

ay-lab commented 3 years ago

It skips the rows (or pairs of regions) that has zero contact count. So you can assume all those missing entries have zero contact count and pvalue=1.

On Tue, Jan 12, 2021 at 12:07 AM Priyatama Pandey notifications@github.com wrote:

Hi, I understand now that how to change fragment midpoint in window. So, if my fragmentMid1 is (m-20KB, m+20KB) then I am looking for fragmentMid2 (col4) (m+20KB,m+40Kb), (m+40KB,m+60Kb),.... and Pvalue for those. I don't see this pattern in my result screenshot attached. For example, I am expecting 920000, 960000, 1000000..etc.. increasing of 40KB as I did it for 40KB resolution. To make it simple to understand chr fragment1_window chr fragment2_window pvalue chr (m-20kb, m+20kb) chr (m+20kb,m+60kb) .0x chr (m-20kb, m+20kb) chr (m+60kb,m+100kb) .0y chr (m-20kb, m+20kb) chr (m+100kb,m+140kb) .0z etc..

[image: Screen Shot 2021-01-11 at 11 42 56 PM] https://user-images.githubusercontent.com/17990280/104285176-1294b680-5468-11eb-890a-f6368ea98cf5.png

Thanks so much, Priya

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ay-lab/fithic/issues/38#issuecomment-758482741, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEJTNJGQ3JPDGTDH3ZX4K2LSZP7MLANCNFSM4UXUJMYQ .

priyatamapandey commented 3 years ago

Got it. Thanks! Is there option I can run for zero contact count? For zero contact pvalue is 1 or adjusted p-value(q value) =1 ?

Priya

priyatamapandey commented 3 years ago

Hi, In my result file I don't see enough interact for my region of interest, I know in few cases there should be interaction. For example I am interested in chromosome 1 interaction counts more than other chromosomes. I found in your google query group if I run fithic chromosome wise, I can increase changes of getting interactions. So, please let me know for running chromosome wise which file or parameter needs to alter?

Thank you, Priya

priyatamapandey commented 3 years ago

Hi,

After running hicpro2fithic.py and found that the output file named fragmentMappability.gz is not exactly in the fithic input required format.

Here is the screenshot.

Screen Shot 2021-01-17 at 12 49 52 AM

I can see my files showing fragment midpoint entires in col 2 and col 3 both. Is that something I can expect or this is some error in my fragment file.

Please suggest me why I am seeing this kind of output.

Thanks, Priya

priyatamapandey commented 3 years ago

Hi,

Thank you so much for all the help. I appreciate it. Sorry for bothering you again. I am explaining my goal that I want to get from FITHIC. I have 25 validated genomic SNPs that occupy one base only. So, I have an anchor of 40KB which has one of these SNPs in the middle of this 40 kb anchor. I am trying to understand if there is any interaction of this SNP on the chromosome for 1 mb in each direction. I have specific interest to look the interaction of this anchor with the segment of 1MB in each direction i.e; 2 Mb in total. Please help me how can I specify these things in FitHiC. Thank you so much for your further help.

Priya

ay-lab commented 3 years ago

Hi. You can just apply fithic to that specific chromosome (to at least have sufficient data to estimate an appropriate background/spline). THEN you can just extract all entries/lines with one of your 40kb bin of interest as the first or the second locus. Then filter out whatever p-value or q-value you want to filter for. Alternatively, you can extract your specific locus and all its interactions then try a 4C analysis tool.

priyatamapandey commented 3 years ago

So that means I can get my bin of interest either on the fragmentmid1 or fragmentmid2? Since I wanted to get my fithic output similar to the fithic output of the HugIn (http://hugin2.genetics.unc.edu/Project) webtool. I have pasted the screenshot of HUGIN output in the earlier query. It has query start and query end that contain a SNP in the middle and oe start and oe end is the 40KB of bins interaction that covers 1MB of distance in each direction. In my case a portion of that fixed bin which has SNP in middle falling on query start -query end column and rest of the portion falling on oe start and oe end column.

I believe that is what you are trying to tell me that it can be on either of the fragment column. Please confirm? If yes, then possibly HUGIn developer have represented the result in different output format.

hugIn_format.txt

Thank you, Priya

ay-lab commented 3 years ago

Yes, to your first question. FitHiC output is sorted so in each line you will have fragment mid1 < fragment mid2. Say your bin midpoint is y then for any x < y you will have an entry "x y". For any z> x you will have "y z". All good? Not sure what Hugin is doing but this is quite straightforward honestly to format, please try.

priyatamapandey commented 3 years ago

Thank you so much for explaining it. If you got chance to check my attached fithic output file Hugin_format.txt. I have no significant p-value and q-value for any of the interaction. Also, I am not getting expected count in my outfile although it is showing version Fit-Hi-C 2.0.7. I am running it on the HPC cluster.

The command I use which does not give me expected count is below

module load python/3.7.6 sample=(cat sra_list.txt | cut -f 1 | sed -n '1,11p' | sed -n ${SLURM_ARRAY_TASK_ID}p `) fithic \ -i /AllTarget_hicpro2fithic/$sample.fithic_input/fithic.interactionCounts.gz \ -f /AllTarget_hicpro2fithic/$sample.fithic_input/fithic.fragmentMappability.gz \ -t /AllTarget_hicpro2fithic/$sample.fithic_input/fithic.biases.gz \ -U 2000000 \ -L 80000 \ -r 40000 \ -b 200 \ -l $sample.fithicRslt_40kb \ -o /HiC_work/fitHic_target \

-x intraOnly \

-v`

When I tried to run it from the script

python3 /myTools/fithic/fithic/fithic.py \ -f fithic.fragmentMappability.gz \ -i fithic.interactionCounts.gz \ -t fithic.biases.gz \ -o test \ -U 2000000 \ -b 200 \ -L 80000 \ -r 40000

It gave me following error

Screen Shot 2021-01-23 at 12 29 39 AM

FYI, I am also running Dixon's hESC data (fastq data) using HiC-Pro and then fithic.

Thank you, Priya

priyatamapandey commented 3 years ago

Hi, I have installed the fithic on my local machine using pip and run the test files. It is working fine and output has the exp count too. But when I ran it on the HiC-Pro generated results it gave me the following error.

Screen Shot 2021-01-24 at 8 12 34 PM

Please suggest me what could be region reason and how should I tackle it?

Thank you so much for all your help, Priya

ay-lab commented 3 years ago

Please see this entry about this issue: https://github.com/ay-lab/fithic/issues/39

priyatamapandey commented 3 years ago

Hi, Thank you. This is helpful. It looks like some time fithic handles the out of range bias value itself and some times not. I guess, may be when it is under some percentage or quantile.

Below is the summary print on the terminal showed out of range loci discarded and fithic run completed with any error.

Screen Shot 2021-01-26 at 2 30 37 PM

I have another file which has many out of range bias values and that file ended up with the error. I will filter out off bias value and run.

Thank you, Priya

priyatamapandey commented 3 years ago

Hi, Thank you for all your help. I have used 1MB resolution and 40Kb resolution to generate fithic significant confidences for the contacts. I found most of the q-value is one for my region of interest when I filtered it down. I found in one of the issue where you mentioned that q-value also depends on the resolution and deep sequencing coverage. Since my q-value is one for those region of interest. I am using p-value cutoff now. How significant is to use p-value cutoff? Will p-value also give me significant interactions ?

Thank you, Priya

41622236 commented 3 years ago

Hi, Thank you. This is helpful. It looks like some time fithic handles the out of range bias value itself and some times not. I guess, may be when it is under some percentage or quantile.

Below is the summary print on the terminal showed out of range loci discarded and fithic run completed with any error.

Screen Shot 2021-01-26 at 2 30 37 PM

I have another file which has many out of range bias values and that file ended up with the error. I will filter out off bias value and run.

Thank you, Priya

Hi, priyatamapandey I also encountered this problem and I read issue: #39. But I still don't know how to solve it, can anyone help me?

Thanks!