ay-lab / fithic

Fit-Hi-C is a tool for assigning statistical confidence estimates to chromosomal contact maps produced by genome-wide genome architecture assays such as Hi-C.
MIT License
79 stars 16 forks source link

HiCKRy.py Key errors #41

Open Jssmith91 opened 3 years ago

Jssmith91 commented 3 years ago

Hi

I have been trying to run HiCKRy.py on data dumped from Juicer. The contact counts were dumped using no normalisation at 1kb resolution (we have greater than 4 billion contacts) The error i keep getting from HiCKRy.py is as follows:

Creating sparse matrix... Traceback (most recent call last): File "HiCKRy.py", line 283, in main() File "HiCKRy.py", line 276, in main matrix,revFrag = loadfastfithicInteractions(args.interactions, args.fragments) File "HiCKRy.py", line 45, in loadfastfithicInteractions x.append(fragDic[chrom1][mid1]) KeyError: '1'

The contacts file generated looks like this:

1 87000 1 87000 2.0 1 87000 1 88000 1.0 1 137000 1 139000 1.0 1 181000 1 181000 17.0 1 181000 1 182000 2.0 1 182000 1 182000 1.0 1 187000 1 190000 1.0 1 190000 1 191000 1.0 1 597000 1 598000 1.0 1 598000 1 599000 1.0

The fragment file generated looks like this:

chr1 0 500 1 1 chr1 1000 1500 1 1 chr1 2000 2500 1 1 chr1 3000 3500 1 1 chr1 4000 4500 1 1 chr1 5000 5500 1 1 chr1 6000 6500 1 1 chr1 7000 7500 1 1 chr1 8000 8500 1 1 chr1 9000 9500 1 1

Do you have any suggestions for this or would it be easier to dump the contacts from Juicer with the KR normalisation already applied?

Thanks in advance,

James

aryakaul commented 3 years ago

Hey James,

You'll want to make sure that the chromosome names for both files are identical. So contacts file should be 'chr1' instead of '1', something like this awk command should do it: awk '{printf "chr%s\t%s\tchr%s\t%s\t%s\n",$1,$2,$3,$4,$5}' $CONTACTSFILE

Jssmith91 commented 3 years ago

Hi these now have exactly the same naming and i still get the following:

Creating sparse matrix... Traceback (most recent call last): File "HiCKRy.py", line 283, in main() File "HiCKRy.py", line 276, in main matrix,revFrag = loadfastfithicInteractions(args.interactions, args.fragments) File "HiCKRy.py", line 45, in loadfastfithicInteractions x.append(fragDic[chrom1][mid1]) KeyError: 45000

ay-lab commented 3 years ago

I think you have fixed this issue but this was again about the mismatch between two different input files. your contacts file did not list midpoints as columns 2 and 4.

DaianeH commented 2 years ago

I'm getting the same key errors.

My contact file looks like:

10 100005000 10 100005000 17 10 100005000 10 100015000 19 10 100005000 10 100025000 3 10 100005000 10 100035000 3 10 100005000 10 100045000 6 10 100005000 10 100055000 7 10 100005000 10 100065000 2 10 100005000 10 100075000 2 10 100005000 10 100095000 1 10 100005000 10 100105000 6

The fragment file looks like:

1 0 5000 1 1 1 10000 15000 1 1 1 20000 25000 1 1 1 30000 35000 1 1 1 40000 45000 1 1 1 50000 55000 1 1 1 60000 65000 1 1 1 70000 75000 1 1 1 80000 85000 1 1 1 90000 95000 1 1

What can the problem be?

DaianeH commented 2 years ago

Just to be more precise, the error is:

Creating sparse matrix... Traceback (most recent call last): File "fithic/fithic/utils/HiCKRy.py", line 283, in main() File "fithic/fithic/utils/HiCKRy.py", line 276, in main matrix,revFrag = loadfastfithicInteractions(args.interactions, args.fragments) File "fithic/fithic/utils/HiCKRy.py", line 46, in loadfastfithicInteractions y.append(fragDic[chrom2][mid2]) KeyError: 82155000

ay-lab commented 2 years ago

You have this midpoint "82155000" in your contacts file (not sure which chr, so you may want to run chr by chr, or put a print statement in the code) but seems like you do not have it listed in the fragments file. That is the error.

DaianeH commented 2 years ago

I created this fragments file using:

python createFitHiCFragments-fixedsize.py --chrLens fithic_protocol_data/data/referenceGenomes/hg19wY-lengths --resolution 10000 --outFile myfile.fragmentsfile.gz

The contacts file was created from a validPairs file with:

sh validPairs2FitHiC-fixedSize.sh 10000 myfile myfile_validPairs.txt .

Is that correct? And if these files were created correctly, how to solve the key error?

Thank you,

ay-lab commented 2 years ago

They look correct. We need to see the full contacts file and fragments file to help you unless you can trace back the entry with midpoint 82155000 yourself