aidenlab / juicer

A One-Click System for Analyzing Loop-Resolution Hi-C Experiments
http://aidenlab.org
MIT License
415 stars 183 forks source link

Juicer pre create .hic ERROR #192

Closed henricheng closed 2 years ago

henricheng commented 3 years ago

I'm trying to run the pre on my own dataset. I tried it on the test dataset and it worked. But once I try to run it on my dataset I get this error. I used the short format, I included a sample of it and the command and the error. I'm not sure what if it's something to do with the format that I'm missing.

Short Format: 1 12 456598 4566 1 12 465735 4657 0.5 0 12 452917 4529 0 12 462054 4621 0.5 0 12 452275 4523 0 12 461412 4614 0.3333333333333333

$ java -Xms512m -Xmx2048m -jar juicer_tools_1.22.01.jar pre out.txt test.hic sacCer3

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARN [2020-11-23T11:50:06,341] [Globals.java:138] [main] Development mode is enabled Using 1 CPU thread(s) Not including fragment map Start preprocess Writing header Writing body java.lang.RuntimeException: No reads in Hi-C contact matrices. This could be because the MAPQ filter is set too high (-q) or because all reads map to the same fragment. at juicebox.tools.utils.original.MatrixZoomDataPP.mergeAndWriteBlocks(MatrixZoomDataPP.java:276) at juicebox.tools.utils.original.Preprocessor.writeMatrix(Preprocessor.java:970) at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:659) at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:425) at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:139) at juicebox.tools.HiCTools.main(HiCTools.java:94)

nchernia commented 3 years ago

Your chromosome is listed as "12" in this example and you've set sacCer3 as the genome ID, which has the below for chromosome names + sizes. You should send in the proper genomeID / chrom.sizes that corresponds to what fasta file you aligned against.

chrIV 1531933 chrXV 1091291 chrVII 1090940 chrXII 1078177 chrXVI 948066 chrXIII 924431 chrII 813184 chrXIV 784333 chrX 745751 chrXI 666816 chrV 576874 chrVIII 562643 chrIX 439888 chrIII 316620 chrVI 270161 chrI 230218 chrM 85779

On Mon, Nov 23, 2020 at 8:54 AM hcheng78 notifications@github.com wrote:

I'm trying to run the pre on my own dataset. I tried it on the test dataset and it worked. But once I try to run it on my dataset I get this error. I used the short format, I included a sample of it and the command and the error. I'm not sure what if it's something to do with the format that I'm missing.

Short Format: 1 12 456598 4566 1 12 465735 4657 0.5 0 12 452917 4529 0 12 462054 4621 0.5 0 12 452275 4523 0 12 461412 4614 0.3333333333333333

$ java -Xms512m -Xmx2048m -jar juicer_tools_1.22.01.jar pre out.txt test.hic sacCer3

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARN [2020-11-23T11:50:06,341] [Globals.java:138] [main] Development mode is enabled Using 1 CPU thread(s) Not including fragment map Start preprocess Writing header Writing body java.lang.RuntimeException: No reads in Hi-C contact matrices. This could be because the MAPQ filter is set too high (-q) or because all reads map to the same fragment. at juicebox.tools.utils.original.MatrixZoomDataPP.mergeAndWriteBlocks(MatrixZoomDataPP.java:276) at juicebox.tools.utils.original.Preprocessor.writeMatrix(Preprocessor.java:970) at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:659) at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:425) at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:139) at juicebox.tools.HiCTools.main(HiCTools.java:94)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/aidenlab/juicer/issues/192, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK2EW55W3CJYTFOSKFF22TSRKHUXANCNFSM4T7XYJKA .

-- Neva Cherniavsky Durand, Ph.D. | she, her, hers Assistant Professor | Molecular and Human Genetics Aiden Lab | Baylor College of Medicine www.aidenlab.org

henricheng commented 3 years ago

I changed the naming conversions for my chromosomes. I tried it with and without the fragment map and this is the error I get.

Short Format w/ Score: 0 chrXVI 854624 8546 0 chrV 447391 4474 0.5 1 chrXII 468356 4684 1 chrXII 459219 4592 0.5 1 chrXII 459210 4592 1 chrXII 468347 4683 0.5 0 chrVI 1374 14 1 chrXVI 945821 9458 0.5

Fragment Map: chrI 0 100 200 300 ... chrII 0 100 200 300 ... chrIII 0 100 200 300 ... ... chrXVI 0 100 200 300 ...

With Fragment Map: $ java -Xmx2g -jar juicer_tools_1.22.01.jar pre -f restriction_fragment.txt out.txt test.hic sacCer3 WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARN [2020-11-23T12:16:30,522] [Globals.java:138] [main] Development mode is enabled Using 1 CPU thread(s) Problem with creating fragment-delimited maps, NullPointerException. This could be due to a null fragment map or to a mismatch in the chromosome name in the fragment map vis-a-vis the input file or chrom.sizes file. Exiting.

Without Fragment Map: $ java -Xmx2g -jar juicer_tools_1.22.01.jar pre out.txt test.hic sacCer3 WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARN [2020-11-23T12:19:28,754] [Globals.java:138] [main] Development mode is enabled Using 1 CPU thread(s) Not including fragment map Start preprocess Writing header Writing body ....Error: the chromosome combination 12_12 appears in multiple blocks

sa501428 commented 3 years ago

This error means the short format file hasn't been sorted. Also please use the forum for general questions: aidenlab.org/forum.html

On Mon, Nov 23, 2020, 11:22 AM hcheng78 notifications@github.com wrote:

I changed the naming conversions for my chromosomes. I tried it with and without the fragment map and this is the error I get.

Short Format w/ Score: 0 chrXVI 854624 8546 0 chrV 447391 4474 0.5 1 chrXII 468356 4684 1 chrXII 459219 4592 0.5 1 chrXII 459210 4592 1 chrXII 468347 4683 0.5 0 chrVI 1374 14 1 chrXVI 945821 9458 0.5

Fragment Map: chrI 0 100 200 300 ... chrII 0 100 200 300 ... chrIII 0 100 200 300 ... ... chrXVI 0 100 200 300 ...

With Fragment Map: $ java -Xmx2g -jar juicer_tools_1.22.01.jar pre -f restriction_fragment.txt out.txt test.hic sacCer3 WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARN [2020-11-23T12:16:30,522] [Globals.java:138] [main] Development mode is enabled Using 1 CPU thread(s) Problem with creating fragment-delimited maps, NullPointerException. This could be due to a null fragment map or to a mismatch in the chromosome name in the fragment map vis-a-vis the input file or chrom.sizes file. Exiting.

Without Fragment Map: $ java -Xmx2g -jar juicer_tools_1.22.01.jar pre out.txt test.hic sacCer3 WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARN [2020-11-23T12:19:28,754] [Globals.java:138] [main] Development mode is enabled Using 1 CPU thread(s) Not including fragment map Start preprocess Writing header Writing body ....Error: the chromosome combination 12_12 appears in multiple blocks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/aidenlab/juicer/issues/192#issuecomment-732305791, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRT23J5RIAAIYCE4X63O7DSRKK63ANCNFSM4T7XYJKA .

nchernia commented 3 years ago

Also your fragment map looks incorrect (it's supposed to be where the restriction enzyme cuts).

On Mon, Nov 23, 2020 at 9:27 AM Muhammad Saad Shamim < notifications@github.com> wrote:

This error means the short format file hasn't been sorted. Also please use the forum for general questions: aidenlab.org/forum.html

On Mon, Nov 23, 2020, 11:22 AM hcheng78 notifications@github.com wrote:

I changed the naming conversions for my chromosomes. I tried it with and without the fragment map and this is the error I get.

Short Format w/ Score: 0 chrXVI 854624 8546 0 chrV 447391 4474 0.5 1 chrXII 468356 4684 1 chrXII 459219 4592 0.5 1 chrXII 459210 4592 1 chrXII 468347 4683 0.5 0 chrVI 1374 14 1 chrXVI 945821 9458 0.5

Fragment Map: chrI 0 100 200 300 ... chrII 0 100 200 300 ... chrIII 0 100 200 300 ... ... chrXVI 0 100 200 300 ...

With Fragment Map: $ java -Xmx2g -jar juicer_tools_1.22.01.jar pre -f restriction_fragment.txt out.txt test.hic sacCer3 WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARN [2020-11-23T12:16:30,522] [Globals.java:138] [main] Development mode is enabled Using 1 CPU thread(s) Problem with creating fragment-delimited maps, NullPointerException. This could be due to a null fragment map or to a mismatch in the chromosome name in the fragment map vis-a-vis the input file or chrom.sizes file. Exiting.

Without Fragment Map: $ java -Xmx2g -jar juicer_tools_1.22.01.jar pre out.txt test.hic sacCer3 WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARN [2020-11-23T12:19:28,754] [Globals.java:138] [main] Development mode is enabled Using 1 CPU thread(s) Not including fragment map Start preprocess Writing header Writing body ....Error: the chromosome combination 12_12 appears in multiple blocks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/aidenlab/juicer/issues/192#issuecomment-732305791, or unsubscribe < https://github.com/notifications/unsubscribe-auth/ABRT23J5RIAAIYCE4X63O7DSRKK63ANCNFSM4T7XYJKA

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/aidenlab/juicer/issues/192#issuecomment-732308512, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK2EWZ4DE4ESF6KHJZ2QNDSRKLPNANCNFSM4T7XYJKA .

-- Neva Cherniavsky Durand, Ph.D. | she, her, hers Assistant Professor | Molecular and Human Genetics Aiden Lab | Baylor College of Medicine www.aidenlab.org

henricheng commented 3 years ago

I tried to use the forum but it's not letting me create a new topic. But, is the short format supposed to be sorted by chromosome or position of the first position or something else?

sa501428 commented 3 years ago

Should be the following unix sort: sort -k2,2d -k6,6d input.txt > output.txt

On Mon, Nov 23, 2020 at 12:40 PM hcheng78 notifications@github.com wrote:

I tried to use the forum but it's not letting me create a new topic. But, is the short format supposed to be sorted by chromosome or position of the first position or something else?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/aidenlab/juicer/issues/192#issuecomment-732350932, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRT23IDECK7WII2ZYYMQCDSRKUBRANCNFSM4T7XYJKA .

nchernia commented 3 years ago

Should also be sorted so that the first chromosome listed is less than second (so that the blocks are together eg chr1/chr2 should be together with chr2/chr1

On Mon, Nov 23, 2020 at 10:48 AM Muhammad Saad Shamim < notifications@github.com> wrote:

Should be the following unix sort:

sort -k2,2d -k6,6d input.txt > output.txt

On Mon, Nov 23, 2020 at 12:40 PM hcheng78 notifications@github.com wrote:

I tried to use the forum but it's not letting me create a new topic.

But, is the short format supposed to be sorted by chromosome or position

of the first position or something else?

You are receiving this because you commented.

Reply to this email directly, view it on GitHub

https://github.com/aidenlab/juicer/issues/192#issuecomment-732350932,

or unsubscribe

< https://github.com/notifications/unsubscribe-auth/ABRT23IDECK7WII2ZYYMQCDSRKUBRANCNFSM4T7XYJKA

.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/aidenlab/juicer/issues/192#issuecomment-732355097, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK2EW7JET66O7OEQSOJIFLSRKVBLANCNFSM4T7XYJKA .

-- Neva Cherniavsky Durand, Ph.D. | she, her, hers Assistant Professor | Molecular and Human Genetics Aiden Lab | Baylor College of Medicine www.aidenlab.org

henricheng commented 3 years ago

I took a subset of the file to test and it's still giving me this error. The names of the chromosomes are the exact same in the restriction file as well.

Subset: 0 chrI 101281 1013 0 chrI 23712 237 0.0625 0 chrI 102077 1021 0 chrI 102078 1021 0.5 0 chrI 112618 1126 0 chrI 112619 1126 0.5 0 chrI 117052 1171 0 chrI 117051 1171 0.5 0 chrI 117643 1176 0 chrI 117644 1176 0.5 0 chrI 11786 118 0 chrI 24218 242 0.3333333333333333 0 chrI 11786 118 1 chrI 207758 2078 0.3333333333333333 0 chrI 11804 118 0 chrI 24236 242 0.3333333333333333 0 chrI 118485 1185 0 chrI 118484 1185 0.5 0 chrI 12011 120 0 chrI 24443 244 0.2 0 chrI 120165 1202 0 chrI 120164 1202 0.3333333333333333 0 chrI 120165 1202 0 chrI 120169 1202 0.3333333333333333 0 chrI 120169 1202 0 chrI 120164 1202 0.3333333333333333 0 chrI 122385 1224 0 chrI 122384 1224 0.5 0 chrI 122768 1228 0 chrI 122769 1228 0.5 0 chrI 123533 1235 0 chrI 123534 1235 0.5 0 chrI 12363 124 0 chrI 24816 248 0.3333333333333333 0 chrI 124539 1245 0 chrI 124540 1245 0.5 0 chrI 128944 1289 0 chrI 128945 1289 0.5 0 chrI 12910 129 1 chrI 206892 2069 0.2 0 chrI 129393 1294 0 chrI 129392 1294 0.5

$ java -Xms512m -Xmx2048m -jar juicer_tools_1.22.01.jar pre -f restriction_fragment.txt out.sorted.txt out.hic sacCer3

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARN [2020-11-23T13:54:36,817] [Globals.java:138] [main] Development mode is enabled Using 1 CPU thread(s) Problem with creating fragment-delimited maps, NullPointerException. This could be due to a null fragment map or to a mismatch in the chromosome name in the fragment map vis-a-vis the input file or chrom.sizes file. Exiting.

nchernia commented 3 years ago

The part of the restriction site file you showed earlier looked incorrect. I would not include the fragment map.

On Mon, Nov 23, 2020 at 10:56 AM hcheng78 notifications@github.com wrote:

I took a subset of the file to test and it's still giving me this error. The names of the chromosomes are the exact same in the restriction file as well.

Subset:

0 chrI 101281 1013 0 chrI 23712 237 0.0625

0 chrI 102077 1021 0 chrI 102078 1021 0.5

0 chrI 112618 1126 0 chrI 112619 1126 0.5

0 chrI 117052 1171 0 chrI 117051 1171 0.5

0 chrI 117643 1176 0 chrI 117644 1176 0.5

0 chrI 11786 118 0 chrI 24218 242 0.3333333333333333

0 chrI 11786 118 1 chrI 207758 2078 0.3333333333333333

0 chrI 11804 118 0 chrI 24236 242 0.3333333333333333

0 chrI 118485 1185 0 chrI 118484 1185 0.5

0 chrI 12011 120 0 chrI 24443 244 0.2

0 chrI 120165 1202 0 chrI 120164 1202 0.3333333333333333

0 chrI 120165 1202 0 chrI 120169 1202 0.3333333333333333

0 chrI 120169 1202 0 chrI 120164 1202 0.3333333333333333

0 chrI 122385 1224 0 chrI 122384 1224 0.5

0 chrI 122768 1228 0 chrI 122769 1228 0.5

0 chrI 123533 1235 0 chrI 123534 1235 0.5

0 chrI 12363 124 0 chrI 24816 248 0.3333333333333333

0 chrI 124539 1245 0 chrI 124540 1245 0.5

0 chrI 128944 1289 0 chrI 128945 1289 0.5

0 chrI 12910 129 1 chrI 206892 2069 0.2

0 chrI 129393 1294 0 chrI 129392 1294 0.5

$ java -Xms512m -Xmx2048m -jar juicer_tools_1.22.01.jar pre -f restriction_fragment.txt out.sorted.txt out.hic sacCer3

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.

WARN [2020-11-23T13:54:36,817] [Globals.java:138] [main] Development mode is enabled

Using 1 CPU thread(s)

Problem with creating fragment-delimited maps, NullPointerException.

This could be due to a null fragment map or to a mismatch in the chromosome name in the fragment map vis-a-vis the input file or chrom.sizes file.

Exiting.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/aidenlab/juicer/issues/192#issuecomment-732359060, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK2EW3CNETIEGW5Y4PPWEDSRKV6LANCNFSM4T7XYJKA .

-- Neva Cherniavsky Durand, Ph.D. | she, her, hers Assistant Professor | Molecular and Human Genetics Aiden Lab | Baylor College of Medicine www.aidenlab.org

henricheng commented 3 years ago

It looks like it is starting to run, but then I get this error.

$ java -Xms512m -Xmx2048m -jar juicer_tools_1.22.01.jar pre out.sorted.txt out.hic sacCer3 WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARN [2020-11-23T13:59:33,062] [Globals.java:138] [main] Development mode is enabled Using 1 CPU thread(s) Not including fragment map Start preprocess Writing header Writing body .................Error: the chromosome combination 1_2 appears in multiple blocks

nchernia commented 3 years ago

Right again you need to sort so that the chromosome blocks are together. The easiest way to do this is by first sorting within the line and then via the Unix sort. So

awk ‘$2 > $6 { print $5,$6,$7,$8,$1,$2,$3,$4,$9} $2<=$6 {print}’ infile.txt | sort -k2,2d -k6,6d

On Mon, Nov 23, 2020 at 11:00 AM hcheng78 notifications@github.com wrote:

It looks like it is starting to run, but then I get this error.

$ java -Xms512m -Xmx2048m -jar juicer_tools_1.22.01.jar pre out.sorted.txt out.hic sacCer3

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.

WARN [2020-11-23T13:59:33,062] [Globals.java:138] [main] Development mode is enabled

Using 1 CPU thread(s)

Not including fragment map

Start preprocess

Writing header

Writing body

.................Error: the chromosome combination 1_2 appears in multiple blocks

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/aidenlab/juicer/issues/192#issuecomment-732361023, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK2EWYNM46S7PGRZ625QY3SRKWNBANCNFSM4T7XYJKA .

-- Neva Cherniavsky Durand, Ph.D. | she, her, hers Assistant Professor | Molecular and Human Genetics Aiden Lab | Baylor College of Medicine www.aidenlab.org

henricheng commented 3 years ago

I'm getting this bash error. I'm unfamiliar with awk.

$ awk ‘$2 > $6 {print $5,$6,$7,$8,$1,$2,$3,$4,$9} $2<=$6 {print}’ out.txt | sort -k2,2d -k6,6d > out.sorted.txt -bash: $6: ambiguous redirect

henricheng commented 3 years ago

I got it to work. But how to do generate the correct fragment file?

sa501428 commented 2 years ago

https://github.com/aidenlab/juicer/blob/main/misc/generate_site_positions.py

Or you can use 'none' and ignore the fragments.