aidenlab / juicer

A One-Click System for Analyzing Loop-Resolution Hi-C Experiments
http://aidenlab.org
MIT License
415 stars 183 forks source link

Running on a SLURM cluster, gives a lot of errors. #32

Closed sameet closed 7 years ago

sameet commented 7 years ago

Hi,

I have the following directory structure:

references:
total 8374528
-rwxr-xr-x+ 1 sm2556 mane 3157608038 Sep 13 12:32 Homo_sapiens_assembly19.fasta  
-rw-r--r--+ 1 sm2556 mane       6663 Sep 13 13:31 Homo_sapiens_assembly19.fasta.amb
-rw-r--r--+ 1 sm2556 mane        939 Sep 13 13:31 Homo_sapiens_assembly19.fasta.ann
-rw-r--r--+ 1 sm2556 mane 3095694072 Sep 13 13:30 Homo_sapiens_assembly19.fasta.bwt
-rw-r--r--+ 1 sm2556 mane  773923497 Sep 13 13:31 Homo_sapiens_assembly19.fasta.pac
-rw-r--r--+ 1 sm2556 mane 1547847040 Sep 13 13:44 Homo_sapiens_assembly19.fasta.sa
-rw-r--r--+ 1 sm2556 mane        377 Sep 13 15:19 Homo_sapiens_assembly19.sizes

restriction_sites:
total 15360
-rw-r--r--+ 1 sm2556 mane 7762896 Sep 13 11:45 hg19_HindIII_new.txt
-rw-r--r--+ 1 sm2556 mane 7762896 Sep 13 11:45 hg19_HindIII.txt

scripts:
total 92800
-rwxr-xr-x+ 1 sm2556 mane     3519 Sep 13 11:26 check.sh
-rwxr-xr-x+ 1 sm2556 mane    15349 Sep 13 11:26 chimeric_blacklist.awk
-rwxr-xr-x+ 1 sm2556 mane     1971 Sep 13 11:26 cleanup.sh
-rwxr-xr-x+ 1 sm2556 mane     3584 Sep 13 11:26 collisions.awk
-rwxr-xr-x+ 1 sm2556 mane     1616 Sep 13 11:26 countligations.sh
-rwxr-xr-x+ 1 sm2556 mane    13448 Sep 13 11:26 diploid.pl
-rw-r--r--+ 1 sm2556 mane     2449 Sep 13 11:26 diploid_split.awk
-rwxr-xr-x+ 1 sm2556 mane     5325 Sep 13 11:26 dups.awk
-rw-r--r--+ 1 sm2556 mane     3726 Sep 13 11:26 fragment_4dnpairs.pl
-rwxr-xr-x+ 1 sm2556 mane     3711 Sep 13 11:26 fragment.pl
-rw-r--r--+ 1 sm2556 mane 30745856 Sep 13 12:31 juicebox
-rw-r--r--+ 1 sm2556 mane 30745856 Sep 13 12:30 Juicebox.jar
-rw-r--r--+ 1 sm2556 mane 30751431 Sep 13 12:30 juicebox_tools.7.0.jar
-rwxr-xr-x+ 1 sm2556 mane     2388 Sep 13 11:26 juicer_arrowhead.sh
-rwxr-xr-x+ 1 sm2556 mane     3269 Sep 13 11:26 juicer_hiccups.sh
-rwxr-xr-x+ 1 sm2556 mane     3651 Sep 13 11:26 juicer_postprocessing.sh
-rwxr-xr-x+ 1 sm2556 mane    41529 Sep 13 11:26 juicer.sh
-rwxr-xr-x+ 1 sm2556 mane     4659 Sep 13 11:26 LibraryComplexity.class
-rwxr-xr-x+ 1 sm2556 mane     7204 Sep 13 11:26 LibraryComplexity.java
-rwxr-xr-x+ 1 sm2556 mane     2354 Sep 13 11:26 makemega_addstats.awk
-rwxr-xr-x+ 1 sm2556 mane    12782 Sep 13 11:26 mega.sh
-rwxr-xr-x+ 1 sm2556 mane     2455 Sep 13 11:26 relaunch_prep.sh
-rwxr-xr-x+ 1 sm2556 mane     5200 Sep 13 11:26 split_rmdups.awk
-rwxr-xr-x+ 1 sm2556 mane    14572 Sep 13 11:26 statistics.pl
-rwxr-xr-x+ 1 sm2556 mane     1751 Sep 13 11:26 stats_sub.awk

fastq:
total 0
lrwxrwxrwx 1 sm2556 mane 67 Sep 13 15:33 S1_003_HiC_R1.fastq.gz -> ../../analysis jul052016/S1_003_HiC/Unaligned/S1_003_HiC_1.fastq.gz
lrwxrwxrwx 1 sm2556 mane 67 Sep 13 15:34 S1_003_HiC_R2.fastq.gz -> ../../analysis-jul052016/S1_003_HiC/Unaligned/S1_003_HiC_2.fastq.gz

My run.sh script for the SLURM batch submission looks as follows:

#!/bin/bash
#SBATCH --partition=general
#SBATCH --job-name=Juicer
#SBATCH --ntasks=1 --nodes=1
#SBATCH --mem-per-cpu=6000
module load BWA; module load Java;  bash /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017/scripts/juicer.sh -g hg19 -d /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017 -q general -l general -a 'Reference' -S 'early' -p /home/sm2556/project/hic-golden-uconn-feb022216/hic-analysis-sept142017/references/Homo_sapiens_assembly19.sizes -s 'HindIII' -y /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017/restriction_sites/hg19_HindIII.txt - D /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017 -x

The scripts folder was copied from the cloned GitHub repository of the juicer/SLURM/scripts.

I get tons of error messages about dependencies not being satisfied, but I still get the part of script that "split" the fastq.gz file correctly, but still ends with error. The actual bwa mem call never happens on the cluster. When I tried to run the script in the CPU mode it started the alignment. But my files are too big, and CPU mode will take a long time. Am I doing something wrong?

nchernia commented 7 years ago

It looks like you have a space between "-" and "D". This would cause problems. You also do not need quotes around early or HindIII, though I'm not sure it will harm anything.

For your "module load" commands, you need to directly modify the juicer.sh script. These won't get carried over into the jobs that are launched by Juicer.

Since you've already split the files, you certainly don't need to send Juicer as a job, just run it from the command line and it will launch. We usually launch in a screen anyway (and not as a job) but you don't even need to do that if the splits directory has already been created.

Fix the error with the -D, remove the aligned directory (it should be empty), and add these lines to the top of the juicer.sh script (under the version):

load_bwa="module load BWA" load_java="module load Java"

Then try again.

On Thu, Sep 14, 2017 at 10:48 AM, Sameet notifications@github.com wrote:

Hi,

I have the following directory structure:

references: total 8374528 -rwxr-xr-x+ 1 sm2556 mane 3157608038 <(315)%20760-8038> Sep 13 12:32 Homo_sapiens_assembly19.fasta -rw-r--r--+ 1 sm2556 mane 6663 Sep 13 13:31 Homo_sapiens_assembly19.fasta.amb -rw-r--r--+ 1 sm2556 mane 939 Sep 13 13:31 Homo_sapiens_assembly19.fasta.ann -rw-r--r--+ 1 sm2556 mane 3095694072 Sep 13 13:30 Homo_sapiens_assembly19.fasta.bwt -rw-r--r--+ 1 sm2556 mane 773923497 Sep 13 13:31 Homo_sapiens_assembly19.fasta.pac -rw-r--r--+ 1 sm2556 mane 1547847040 Sep 13 13:44 Homo_sapiens_assembly19.fasta.sa -rw-r--r--+ 1 sm2556 mane 377 Sep 13 15:19 Homo_sapiens_assembly19.sizes

restriction_sites: total 15360 -rw-r--r--+ 1 sm2556 mane 7762896 Sep 13 11:45 hg19_HindIII_new.txt -rw-r--r--+ 1 sm2556 mane 7762896 Sep 13 11:45 hg19_HindIII.txt

scripts: total 92800 -rwxr-xr-x+ 1 sm2556 mane 3519 Sep 13 11:26 check.sh -rwxr-xr-x+ 1 sm2556 mane 15349 Sep 13 11:26 chimeric_blacklist.awk -rwxr-xr-x+ 1 sm2556 mane 1971 Sep 13 11:26 cleanup.sh -rwxr-xr-x+ 1 sm2556 mane 3584 Sep 13 11:26 collisions.awk -rwxr-xr-x+ 1 sm2556 mane 1616 Sep 13 11:26 countligations.sh -rwxr-xr-x+ 1 sm2556 mane 13448 Sep 13 11:26 diploid.pl -rw-r--r--+ 1 sm2556 mane 2449 Sep 13 11:26 diploid_split.awk -rwxr-xr-x+ 1 sm2556 mane 5325 Sep 13 11:26 dups.awk -rw-r--r--+ 1 sm2556 mane 3726 Sep 13 11:26 fragment_4dnpairs.pl -rwxr-xr-x+ 1 sm2556 mane 3711 Sep 13 11:26 fragment.pl -rw-r--r--+ 1 sm2556 mane 30745856 Sep 13 12:31 juicebox -rw-r--r--+ 1 sm2556 mane 30745856 Sep 13 12:30 Juicebox.jar -rw-r--r--+ 1 sm2556 mane 30751431 Sep 13 12:30 juicebox_tools.7.0.jar -rwxr-xr-x+ 1 sm2556 mane 2388 Sep 13 11:26 juicer_arrowhead.sh -rwxr-xr-x+ 1 sm2556 mane 3269 Sep 13 11:26 juicer_hiccups.sh -rwxr-xr-x+ 1 sm2556 mane 3651 Sep 13 11:26 juicer_postprocessing.sh -rwxr-xr-x+ 1 sm2556 mane 41529 Sep 13 11:26 juicer.sh -rwxr-xr-x+ 1 sm2556 mane 4659 Sep 13 11:26 LibraryComplexity.class -rwxr-xr-x+ 1 sm2556 mane 7204 Sep 13 11:26 LibraryComplexity.java -rwxr-xr-x+ 1 sm2556 mane 2354 Sep 13 11:26 makemega_addstats.awk -rwxr-xr-x+ 1 sm2556 mane 12782 Sep 13 11:26 mega.sh -rwxr-xr-x+ 1 sm2556 mane 2455 Sep 13 11:26 relaunch_prep.sh -rwxr-xr-x+ 1 sm2556 mane 5200 Sep 13 11:26 split_rmdups.awk -rwxr-xr-x+ 1 sm2556 mane 14572 Sep 13 11:26 statistics.pl -rwxr-xr-x+ 1 sm2556 mane 1751 Sep 13 11:26 stats_sub.awk

fastq: total 0 lrwxrwxrwx 1 sm2556 mane 67 Sep 13 15:33 S1_003_HiC_R1.fastq.gz -> ../../analysis jul052016/S1_003_HiC/Unaligned/S1_003_HiC_1.fastq.gz lrwxrwxrwx 1 sm2556 mane 67 Sep 13 15:34 S1_003_HiC_R2.fastq.gz -> ../../analysis-jul052016/S1_003_HiC/Unaligned/S1_003_HiC_2.fastq.gz

My run.sh script for the SLURM batch submission looks as follows:

!/bin/bash

SBATCH --partition=general

SBATCH --job-name=Juicer

SBATCH --ntasks=1 --nodes=1

SBATCH --mem-per-cpu=6000

module load BWA; module load Java; bash /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017/scripts/juicer.sh -g hg19 -d /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017 -q general -l general -a 'Reference' -S 'early' -p /home/sm2556/project/hic-golden-uconn-feb022216/hic-analysis-sept142017/references/Homo_sapiens_assembly19.sizes -s 'HindIII' -y /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017/restriction_sites/hg19_HindIII.txt - D /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017 -x

I get tons of error messages about dependencies not being satisfied, but I still get the part of script that "split" the fastq.gz file correctly, but still ends with error. The actual bwa mem call never happens on the cluster. When I tried to run the script in the CPU mode it started the alignment. But my files are too big, and CPU mode will take a long time. Am I doing something wrong?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/theaidenlab/juicer/issues/32, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWiW1e999LZS_1KwkO3ww5NAJE1efRSks5siTy6gaJpZM4PXtER .

-- Neva Cherniavsky Durand, Ph.D. Staff Scientist, Aiden Lab www.aidenlab.org

sameet commented 7 years ago

Hi @nchernia, Thank you for the prompt reply. I fixed the run.sh as you suggested. Also added the edited the load_bwa, and load_java in the juicer.sh. I removed the debug, aligned, and splits folders (want to start with a clean slate). I re-ran the run.sh as follows:

[sm2556@ruddle2 hic-analysis-sept142017]$ sbatch run.sh

[sm2556@ruddle2 hic-analysis-sept142017]$ more slurm-1674096.out
(-: Looking for fastq files...fastq files exist
(-: Aligning files matching /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017/fastq/*_R*.fastq* 
in queue general to genome hg19 with no fragment delimited maps.
(-: Created /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017/splits and /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017/aligned.
(-: Starting job to launch other jobs once splitting is complete
sbatch: error: Batch job submission failed: Requested node configuration is not available
sbatch: error: Batch job submission failed: Requested node configuration is not available
sbatch: error: Batch job submission failed: Job dependency problem
sbatch: error: Batch job submission failed: Job dependency problem
sbatch: error: Batch job submission failed: Job dependency problem
sbatch: error: Batch job submission failed: Job dependency problem
sbatch: error: Batch job submission failed: Job dependency problem
sbatch: error: Batch job submission failed: Job dependency problem
sbatch: option requires an argument -- 'd'
(-: Finished adding all jobs... Now is a good time to get that cup of coffee..

I can see the splits directory re-created, and bunch of .fastq chunks being created in it right now. But I am pretty sure that the run is going to fail. How do I fix this?

nchernia commented 7 years ago

It looks like the very first jobs, which are the alignment jobs, have a configuration that doesn't work for your cluster. It could be due to memory or node requirements. Could you send your run.sh script?

On Thu, Sep 14, 2017 at 12:37 PM, Sameet notifications@github.com wrote:

Hi @nchernia https://github.com/nchernia, Thank you for the prompt reply. I fixed the run.sh as you suggested. I removed the debug, aligned, and splits folders (want to start with a clean slate). I re-ran the run.sh as follows:

[sm2556@ruddle2 hic-analysis-sept142017]$ sbatch run.sh

[sm2556@ruddle2 hic-analysis-sept142017]$ more slurm-1674096.out (-: Looking for fastq files...fastq files exist (-: Aligning files matching /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017/fastq/_R.fastq* in queue general to genome hg19 with no fragment delimited maps. (-: Created /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017/splits and /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017/aligned. (-: Starting job to launch other jobs once splitting is complete sbatch: error: Batch job submission failed: Requested node configuration is not available sbatch: error: Batch job submission failed: Requested node configuration is not available sbatch: error: Batch job submission failed: Job dependency problem sbatch: error: Batch job submission failed: Job dependency problem sbatch: error: Batch job submission failed: Job dependency problem sbatch: error: Batch job submission failed: Job dependency problem sbatch: error: Batch job submission failed: Job dependency problem sbatch: error: Batch job submission failed: Job dependency problem sbatch: option requires an argument -- 'd' (-: Finished adding all jobs... Now is a good time to get that cup of coffee..

I can see the splits directory re-created, and bunch of .fastq chunks being created in it right now. But I am pretty sure that the run is going to fail. How do I fix this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/theaidenlab/juicer/issues/32#issuecomment-329539167, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWiWzyNaXCuhXnvWa-LqdSD7cf446Cgks5siVY1gaJpZM4PXtER .

-- Neva Cherniavsky Durand, Ph.D. Staff Scientist, Aiden Lab www.aidenlab.org

sameet commented 7 years ago
#!/bin/bash
#SBATCH --partition=general
#SBATCH --job-name=Juicer
#SBATCH --ntasks=1 --nodes=1
#SBATCH --mem-per-cpu=6000

bash /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017/scripts/juicer.sh -g hg19 -d /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017 -q general -l general -a 'Reference' -S 'early' -p /home/sm2556/project/hic-golden-uconn-feb022216/hic-analysis-sept142017/references/Homo_sapiens_assembly19.sizes -s 'HindIII' -y /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017/restriction_sites/hg19_HindIII.txt -D /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017 -x

That is my run.sh.

nchernia commented 7 years ago

What does your debug folder look like?

On Thu, Sep 14, 2017 at 1:46 PM, Sameet notifications@github.com wrote:

!/bin/bash

SBATCH --partition=general

SBATCH --job-name=Juicer

SBATCH --ntasks=1 --nodes=1

SBATCH --mem-per-cpu=6000

bash /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017/scripts/juicer.sh -g hg19 -d /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017 -q general -l general -a 'Reference' -S 'early' -p /home/sm2556/project/hic-golden-uconn-feb022216/hic-analysis-sept142017/references/Homo_sapiens_assembly19.sizes -s 'HindIII' -y /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017/restriction_sites/hg19_HindIII.txt -D /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017 -x

That is my run.sh.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/theaidenlab/juicer/issues/32#issuecomment-329557686, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWiWw2hAprdhyYI1qf983Zh8Tr5-Grhks5siWZcgaJpZM4PXtER .

-- Neva Cherniavsky Durand, Ph.D. Staff Scientist, Aiden Lab www.aidenlab.org

sameet commented 7 years ago

Hi,

Before I answer your question, I think I found one mistake that I was doing. When I call the script without giving number of threads, it assumes 16, and allocates 40 GB per core. We do not have many such nodes. So that seems to be one of the problems. I have edited the script to use -t 5, and hopefully that will solve one problem.

Also I, with help from our system administrator, went through your source code carefully, and I finally understood why I do not need to call my run.sh by sbatch. The original script of course died after splitting the fastq files into 128 chunks. Now in my splits directory I have 128 x 2 fastq files. At this point I just re-ran the actual script part of run.sh, and it is generating lots of .txt files in the splits folder, like so:

-rw-r--r--+ 1 sm2556 mane 9 Sep 14 15:53 S1_003_HiC.fastq102.fastq_linecount.txt
-rw-r--r--+ 1 sm2556 mane 8 Sep 14 15:53 S1_003_HiC.fastq102.fastq_norm.txt.res.txt
-rw-r--r--+ 1 sm2556 mane 9 Sep 14 15:53 S1_003_HiC.fastq103.fastq_linecount.txt
-rw-r--r--+ 1 sm2556 mane 8 Sep 14 15:53 S1_003_HiC.fastq103.fastq_norm.txt.res.txt
-rw-r--r--+ 1 sm2556 mane 9 Sep 14 15:53 S1_003_HiC.fastq104.fastq_linecount.txt
-rw-r--r--+ 1 sm2556 mane 8 Sep 14 15:53 S1_003_HiC.fastq104.fastq_norm.txt.res.txt

And, my debug folder has a bunch of .err, and .out files.

In the debug folder, there is a head-1674970.out file, the contents are as follows:

Thu Sep 14 16:11:33 EDT 2017
Experiment description: Reference; Juicer version 1.5.6;5 threads; splitsize 90000000; openjdk 
version "1.8.0_131"; ./scripts/juicer.sh -t 5 -g hg19 -d /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017 -q general -l general -a Reference -S early -p ./references/Homo_sapiens_assembly19.sizes -s HindIII -y ./restriction_sites/hg19_HindIII.txt -D /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017 -x
sameet commented 7 years ago

Is there any place where I can see the batch file generated with this. It seems that the first step where it splits the fastq into multiple chunks, seems to be working, but not the alignment part. What am I missing? When I try to do this in CPU mode, that seems to work.

nchernia commented 7 years ago

You should look at the individual .err and .out files in your debug folder. For example, you can do tail -n 2 align*.out to see if there's a successful message printed.

On Thu, Sep 14, 2017 at 6:42 PM, Sameet notifications@github.com wrote:

Is there any place where I can see the batch file generated with this. It seems that the first step where it splits the fastq into multiple chunks, seems to be working, but not the alignment part. What am I missing? When I try to do this in CPU mode, that seems to work.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/theaidenlab/juicer/issues/32#issuecomment-329628683, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWiW49fHNir7wosnJ6jTfswIkVsFZB-ks5siavmgaJpZM4PXtER .

-- Neva Cherniavsky Durand, Ph.D. Staff Scientist, Aiden Lab www.aidenlab.org

sameet commented 7 years ago
==> count_ligation-1676612.out <==
Thu Sep 14 18:31:34 EDT 2017
Thu Sep 14 18:36:55 EDT 2017

==> head-1676609.out <==
Thu Sep 14 18:31:34 EDT 2017
Experiment description: Reference; Juicer version 1.5.6;5 threads; splitsize 90000000; openjdk 
version "1.8.0_131"; ./scripts/juicer.sh -d /ycga-gpfs/project/ycga/mane/sm2556/hic-golden-
uconn-feb0222016/hic-analysis-sept152017 -q general -l general -s HindIII -a Reference -p 
./references/genome.chrom.sizes -y ./restriction_sites/hg19_HindIII.txt -D /ycga-
gpfs/project/ycga/mane/sm2556/hic-golden-uconn-feb0222016/hic-analysis-sept152017 -z 
./references/genome.fa -t 5

==> split-1676610.out <==
Split file: S1_003_HiC_R1.fastq
Thu Sep 14 20:28:23 EDT 2017

==> split-1676611.out <==
Split file: S1_003_HiC_R2.fastq
Thu Sep 14 20:25:00 EDT 2017

Today I saw these (this is just an example I have ~200 such files in the debug folder.

==> count_ligation-1676612.out <==
Thu Sep 14 18:31:34 EDT 2017
Thu Sep 14 18:36:55 EDT 2017

==> count_ligation-1677151.out <==
Fri Sep 15 08:19:21 EDT 2017
Fri Sep 15 08:24:16 EDT 2017

==> count_ligation-1677155.out <==
Fri Sep 15 08:19:22 EDT 2017
Fri Sep 15 08:24:13 EDT 2017

==> count_ligation-1677159.out <==
Fri Sep 15 08:19:22 EDT 2017
Fri Sep 15 08:24:19 EDT 2017

==> count_ligation-1677163.out <==
Fri Sep 15 08:19:22 EDT 2017
Fri Sep 15 08:24:19 EDT 2017

==> count_ligation-1677167.out <==
Fri Sep 15 08:19:22 EDT 2017
Fri Sep 15 08:24:19 EDT 2017

Me and our system admins were at this issue for nearly whole of yesterday, but to the best of my understanding, the juicer.sh should have found everything required.

nchernia commented 7 years ago

Are these the only files in your debug folder?

On Fri, Sep 15, 2017 at 8:12 AM, Sameet notifications@github.com wrote:

==> count_ligation-1676612.out <== Thu Sep 14 18:31:34 EDT 2017 Thu Sep 14 18:36:55 EDT 2017

==> head-1676609.out <== Thu Sep 14 18:31:34 EDT 2017 Experiment description: Reference; Juicer version 1.5.6;5 threads; splitsize 90000000; openjdk version "1.8.0_131"; ./scripts/juicer.sh -d /ycga-gpfs/project/ycga/mane/sm2556/hic-golden- uconn-feb0222016/hic-analysis-sept152017 -q general -l general -s HindIII -a Reference -p ./references/genome.chrom.sizes -y ./restriction_sites/hg19_HindIII.txt -D /ycga- gpfs/project/ycga/mane/sm2556/hic-golden-uconn-feb0222016/hic-analysis-sept152017 -z ./references/genome.fa -t 5

==> split-1676610.out <== Split file: S1_003_HiC_R1.fastq Thu Sep 14 20:28:23 EDT 2017

==> split-1676611.out <== Split file: S1_003_HiC_R2.fastq Thu Sep 14 20:25:00 EDT 2017

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/theaidenlab/juicer/issues/32#issuecomment-329764436, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWiW5AMnJsuW3vaBolWyWYT8Cg-6gLfks5simmhgaJpZM4PXtER .

-- Neva Cherniavsky Durand, Ph.D. Staff Scientist, Aiden Lab www.aidenlab.org

sameet commented 7 years ago

No, there are some 200 odd files. The output in all of them looks similar. There are two other types, head.*, and split-*. Those look as follows:

==> head-1676609.out <==
Thu Sep 14 18:31:34 EDT 2017
Experiment description: Reference; Juicer version 1.5.6;5 threads; splitsize 90000000; openjdk version "1.8.0_131"; ./scripts/juicer.sh -d /ycga-gpfs/project/ycga/mane/sm2556/hic-golden-uconn-feb0222016/hic-analysis-sept152017 -q general -l general -s HindIII -a Reference -p ./references/genome.chrom.sizes -y ./restriction_sites/hg19_HindIII.txt -D /ycga-gpfs/project/ycga/mane/sm2556/hic-golden-uconn-feb0222016/hic-analysis-sept152017 -z ./references/genome.fa -t 5

==> head-1677150.out <==
Fri Sep 15 08:19:21 EDT 2017
Experiment description: Reference; Juicer version 1.5.6;5 threads; splitsize 90000000; openjdk version "1.8.0_131"; ./scripts/juicer.sh -d /ycga-gpfs/project/ycga/mane/sm2556/hic-golden-uconn-feb0222016/hic-analysis-sept152017 -q general -l general -s HindIII -a Reference -p ./references/genome.chrom.sizes -y ./restriction_sites/hg19_HindIII.txt -D /ycga-gpfs/project/ycga/mane/sm2556/hic-golden-uconn-feb0222016/hic-analysis-sept152017 -z ./references/genome.fa -t 5

==> head-1677810.out <==
Fri Sep 15 08:25:09 EDT 2017
Experiment description: Reference; Juicer version 1.5.6;5 threads; splitsize 90000000; openjdk version "1.8.0_131"; ./scripts/juicer.sh -d /ycga-gpfs/project/ycga/mane/sm2556/hic-golden-uconn-feb0222016/hic-analysis-sept152017 -q general -l general -s HindIII -a Reference -p ./references/genome.chrom.sizes -y ./restriction_sites/hg19_HindIII.txt -D /ycga-gpfs/project/ycga/mane/sm2556/hic-golden-uconn-feb0222016/hic-analysis-sept152017 -z ./references/genome.fa -t 5

==> split-1676610.out <==
Split file: S1_003_HiC_R1.fastq
Thu Sep 14 20:28:23 EDT 2017

==> split-1676611.out <==
Split file: S1_003_HiC_R2.fastq
Thu Sep 14 20:25:00 EDT 2017
nchernia commented 7 years ago

But no align ones?

On Fri, Sep 15, 2017 at 8:54 AM, Sameet notifications@github.com wrote:

No, there are some 200 odd files. The output in all of them looks similar. There are two other types, head., and split-. Those look as follows:

==> head-1676609.out <== Thu Sep 14 18:31:34 EDT 2017 Experiment description: Reference; Juicer version 1.5.6;5 threads; splitsize 90000000; openjdk version "1.8.0_131"; ./scripts/juicer.sh -d /ycga-gpfs/project/ycga/mane/sm2556/hic-golden-uconn-feb0222016/hic-analysis-sept152017 -q general -l general -s HindIII -a Reference -p ./references/genome.chrom.sizes -y ./restriction_sites/hg19_HindIII.txt -D /ycga-gpfs/project/ycga/mane/sm2556/hic-golden-uconn-feb0222016/hic-analysis-sept152017 -z ./references/genome.fa -t 5

==> head-1677150.out <== Fri Sep 15 08:19:21 EDT 2017 Experiment description: Reference; Juicer version 1.5.6;5 threads; splitsize 90000000; openjdk version "1.8.0_131"; ./scripts/juicer.sh -d /ycga-gpfs/project/ycga/mane/sm2556/hic-golden-uconn-feb0222016/hic-analysis-sept152017 -q general -l general -s HindIII -a Reference -p ./references/genome.chrom.sizes -y ./restriction_sites/hg19_HindIII.txt -D /ycga-gpfs/project/ycga/mane/sm2556/hic-golden-uconn-feb0222016/hic-analysis-sept152017 -z ./references/genome.fa -t 5

==> head-1677810.out <== Fri Sep 15 08:25:09 EDT 2017 Experiment description: Reference; Juicer version 1.5.6;5 threads; splitsize 90000000; openjdk version "1.8.0_131"; ./scripts/juicer.sh -d /ycga-gpfs/project/ycga/mane/sm2556/hic-golden-uconn-feb0222016/hic-analysis-sept152017 -q general -l general -s HindIII -a Reference -p ./references/genome.chrom.sizes -y ./restriction_sites/hg19_HindIII.txt -D /ycga-gpfs/project/ycga/mane/sm2556/hic-golden-uconn-feb0222016/hic-analysis-sept152017 -z ./references/genome.fa -t 5

==> split-1676610.out <== Split file: S1_003_HiC_R1.fastq Thu Sep 14 20:28:23 EDT 2017

==> split-1676611.out <== Split file: S1_003_HiC_R2.fastq Thu Sep 14 20:25:00 EDT 2017

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/theaidenlab/juicer/issues/32#issuecomment-329774600, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWiW9I0OXRiGQisBPR4PxspIgLOwa-7ks5sinORgaJpZM4PXtER .

-- Neva Cherniavsky Durand, Ph.D. Staff Scientist, Aiden Lab www.aidenlab.org

sameet commented 7 years ago

Nope.

nchernia commented 7 years ago

What was the output of the run.sh script? Did you have the same sbatch errors?

Are there jobs running right now on the cluster?

On Fri, Sep 15, 2017 at 9:03 AM, Sameet notifications@github.com wrote:

Nope.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/theaidenlab/juicer/issues/32#issuecomment-329776533, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWiW8pAbLWh1nQg5p3gZYLUYc5DEAR3ks5sinVzgaJpZM4PXtER .

-- Neva Cherniavsky Durand, Ph.D. Staff Scientist, Aiden Lab www.aidenlab.org

sameet commented 7 years ago

log.txt I did the sh juicer.sh ... >log 2&>1 this time. The errors look the same. The jobs got fired on the cluster and also died within few minutes. I am pasting the log file here.

nchernia commented 7 years ago

You are still having problems with the node configuration. Try running with "-t 1" to set threads to just 1.

What is the max memory you can request?

You don't need to redo the splits. Don't delete the splits folder, just do "rmdir aligned" and rerun with the adjusted threads.

On Fri, Sep 15, 2017 at 9:24 AM, Sameet notifications@github.com wrote:

log.txt https://github.com/theaidenlab/juicer/files/1306479/log.txt I did the sh juicer.sh ... >log 2&>1 this time. The errors look the same. The jobs got fired on the cluster and also died within few minutes. I am pasting the log file here.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/theaidenlab/juicer/issues/32#issuecomment-329781660, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWiW3tdRpjux59wOQZ0ba3LRNSAD0tIks5sinqmgaJpZM4PXtER .

-- Neva Cherniavsky Durand, Ph.D. Staff Scientist, Aiden Lab www.aidenlab.org

sameet commented 7 years ago

Understood. I will try that immediately. All of our nodes have at least 128 GB RAM. We have few nodes with upto 1.5 TB RAM. But setting t=1 makes sense. I will try that and let you know how that works.

sameet commented 7 years ago

Well, something more seems to have happened 😄 . The bwa mem jobs at least seem to have been generated and submitted. But the outputs are still empty. I am attaching the tail -n 5 align*.out herewith. align_log.txt

nchernia commented 7 years ago

What do you mean, the outputs are empty? What does ls -l on splits return?

On Fri, Sep 15, 2017 at 9:59 AM Sameet notifications@github.com wrote:

Well, something more seems to have happened 😄 . The bwa mem jobs at least seem to have been generated and submitted. But the outputs are still empty. I am attaching the tail -n 5 align*.out herewith.

align_log.txt https://github.com/theaidenlab/juicer/files/1306551/align_log.txt

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/theaidenlab/juicer/issues/32#issuecomment-329790897, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWiWyrhJ9ca8cG8iPxqq-DD7YSdqGIhks5sioK-gaJpZM4PXtER .

sameet commented 7 years ago

There are bunch of .sam files. But they are empty.

sameet commented 7 years ago

This is the last few lines of ls -l in the splits directory.

-rw-r--r--+ 1 sm2556 mane 5.4G Sep 14 20:24 S1_003_HiC_R2.fastq127.fastq
-rw-r--r--+ 1 sm2556 mane    0 Sep 15 09:51 S1_003_HiC_R2.fastq127.fastq.sam
-rw-r--r--+ 1 sm2556 mane    0 Sep 15 09:52 S1_003_HiC_R2.fastq127.fastq_sort1.sam
-rw-r--r--+ 1 sm2556 mane    0 Sep 15 09:52 S1_003_HiC_R2.fastq127.fastq_sort.sam
-rw-r--r--+ 1 sm2556 mane 741M Sep 14 20:25 S1_003_HiC_R2.fastq128.fastq
-rw-r--r--+ 1 sm2556 mane    0 Sep 15 09:52 S1_003_HiC_R2.fastq128.fastq.sam
-rw-r--r--+ 1 sm2556 mane    0 Sep 15 09:52 S1_003_HiC_R2.fastq128.fastq_sort1.sam
-rw-r--r--+ 1 sm2556 mane    0 Sep 15 09:52 S1_003_HiC_R2.fastq128.fastq_sort.sam
nchernia commented 7 years ago

Are your fastqs also empty? Look st the output in the align out files. How many reads does it claim to be aligning?

On Fri, Sep 15, 2017 at 10:03 AM Sameet notifications@github.com wrote:

There are bunch of .sam files. But they are empty.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/theaidenlab/juicer/issues/32#issuecomment-329791808, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWiW7A7iPT-B_TM3KRDRZ7A0FPgwuLQks5sioOUgaJpZM4PXtER .

nchernia commented 7 years ago

Look at the align.err files too.

On Fri, Sep 15, 2017 at 10:05 AM Sameet notifications@github.com wrote:

This is the last few lines of ls -l in the splits directory.

-rw-r--r--+ 1 sm2556 mane 5.4G Sep 14 20:24 S1_003_HiC_R2.fastq127.fastq

-rw-r--r--+ 1 sm2556 mane 0 Sep 15 09:51 S1_003_HiC_R2.fastq127.fastq.sam

-rw-r--r--+ 1 sm2556 mane 0 Sep 15 09:52 S1_003_HiC_R2.fastq127.fastq_sort1.sam

-rw-r--r--+ 1 sm2556 mane 0 Sep 15 09:52 S1_003_HiC_R2.fastq127.fastq_sort.sam

-rw-r--r--+ 1 sm2556 mane 741M Sep 14 20:25 S1_003_HiC_R2.fastq128.fastq

-rw-r--r--+ 1 sm2556 mane 0 Sep 15 09:52 S1_003_HiC_R2.fastq128.fastq.sam

-rw-r--r--+ 1 sm2556 mane 0 Sep 15 09:52 S1_003_HiC_R2.fastq128.fastq_sort1.sam

-rw-r--r--+ 1 sm2556 mane 0 Sep 15 09:52 S1_003_HiC_R2.fastq128.fastq_sort.sam

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/theaidenlab/juicer/issues/32#issuecomment-329792412, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWiW_94HiXbmtpCV9ROh0oICPM5ukimks5sioQcgaJpZM4PXtER .

sameet commented 7 years ago

The align*.err showed the following:

==> align2-1679007.err <==
slurmstepd: error: execve(): bwa: No such file or directory
srun: error: c17n06: task 0: Exited with exit code 2

So I checked the following:

[******@****** hic-analysis-sept152017]$ grep load_bwa ./scripts/juicer.sh
    # load_bwa="module load BioBuilds/2015.04"
    load_bwa="module load BWA"
        $load_bwa
                $load_bwa
                $load_bwa
[sm2556@ruddle2 hic-analysis-sept152017]$ module load BWA
[******@****** hic-analysis-sept152017]$ which bwa
/ycga-gpfs/apps/hpc/software/BWA/0.7.15-foss-2016a/bin/bwa
[******@****** hic-analysis-sept152017]$
nchernia commented 7 years ago

Yes as I said up thread, you need to change load_bwa and load_java in the script to work for your system.

On Fri, Sep 15, 2017 at 10:49 AM Sameet notifications@github.com wrote:

The align*.err showed the following:

==> align2-1679007.err <==

slurmstepd: error: execve(): bwa: No such file or directory

srun: error: c17n06: task 0: Exited with exit code 2

So I checked the following:

[**@** hic-analysis-sept152017]$ grep load_bwa ./scripts/juicer.sh

# load_bwa="module load BioBuilds/2015.04"

load_bwa="module load BWA"

    $load_bwa

            $load_bwa

            $load_bwa

[sm2556@ruddle2 hic-analysis-sept152017]$ module load BWA

[**@** hic-analysis-sept152017]$ which bwa

/ycga-gpfs/apps/hpc/software/BWA/0.7.15-foss-2016a/bin/bwa

[**@** hic-analysis-sept152017]$

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/theaidenlab/juicer/issues/32#issuecomment-329804581, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWiW1tMtfjDLhHP3_gOOaV76W7YMBFrks5sio5vgaJpZM4PXtER .

sameet commented 7 years ago

Yes, I changed it. As you can see in the second panel, the commented out line is the original line from the juicer.sh. The second line is the one that I put in.

nchernia commented 7 years ago

You might need your system administor's help - I don't know why the job on the cluster isn't seeing bwa. You might try a small test job with just loading the module and running bwa with no arguments.

On Fri, Sep 15, 2017 at 11:22 AM Sameet notifications@github.com wrote:

Yes, I changed it. As you can see in the second panel, the commented out line is the original line from the juicer.sh. The second line is the one that I put in.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/theaidenlab/juicer/issues/32#issuecomment-329814209, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWiWyFBZ_Br_7FuK17FQw8jwXmAByG3ks5sipYpgaJpZM4PXtER .

nchernia commented 7 years ago

Also - as I said above - put the lines directly under "version=". Otherwise they won't get executed.

On Fri, Sep 15, 2017 at 11:42 AM Neva Durand neva@broadinstitute.org wrote:

You might need your system administor's help - I don't know why the job on the cluster isn't seeing bwa. You might try a small test job with just loading the module and running bwa with no arguments.

On Fri, Sep 15, 2017 at 11:22 AM Sameet notifications@github.com wrote:

Yes, I changed it. As you can see in the second panel, the commented out line is the original line from the juicer.sh. The second line is the one that I put in.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/theaidenlab/juicer/issues/32#issuecomment-329814209, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWiWyFBZ_Br_7FuK17FQw8jwXmAByG3ks5sipYpgaJpZM4PXtER .

sameet commented 7 years ago

I figured out one reason, inside the script it is using $load_bwa, that was failing, I changed it to ${load_bwa}, that seems to be working, but I got bus and memory errors. I will work with my system admin to see if anything can be done about that.

nchernia commented 7 years ago

OK, I have changed that in the script if you want to pull.

On Fri, Sep 15, 2017 at 11:45 AM, Sameet notifications@github.com wrote:

I figured out one reason, inside the script it is using $load_bwa, that was failing, I changed it to ${load_bwa}, that seems to be working, but I got bus and memory errors. I will work with my system admin to see if anything can be done about that.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/theaidenlab/juicer/issues/32#issuecomment-329820761, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWiW3u5ccNpuHBtfW9SmvI0S9cdypxLks5sipuOgaJpZM4PXtER .

-- Neva Cherniavsky Durand, Ph.D. Staff Scientist, Aiden Lab www.aidenlab.org

sameet commented 7 years ago

Thanks. Also setting -t 2 was required because of memory issues. I guess general rule of thumb is the memory allocated should be more than the size of the chunk. It is aligning now. I will keep you posted in this thread on how things go.

sameet commented 7 years ago

The things seem to be working so far. The alignments completed. I see a lot of merge* files in the debug folder. The run ended without completion or any files in the aligned folder. After reading your wiki, I thought this would be the time when I start re-launch on the juicer.sh with -S merge option. I changed the partition parameters to allocate nodes with maximum RAM and the jobs seem to be working. I can see merged_sort.txt in the aligned folder. I guess this will be a fairly slow step. Shall keep you posted on how things go.

I noticed in the code that the time limit for the merge step is 24 hours. Isn't that too low? I think that most of the times 7 days is a good limit for batch jobs.

sameet commented 7 years ago

After one more false start, the pipeline seems to be working. I am currently at the dedup step. The file was huge. It was split into over 500 chunks. One of the chunks failed. Will that affect the final step. Will I have to run dedup again? Can I just remove that one .err file, and let it continue?

nchernia commented 7 years ago

The best thing to do is run the “dups.awk” script on the one chunk that failed (naming in the same way as in the Juicer script). Then once it has finished, check sizes and concatenate.

name=[YOUR NAME of failed file here, usually starts with "a" and the date, then "_msplit" and then a number] splitname=[YOUR NAME of failed split file here, started with "split" and same number as above]

awk -f dups.awk -v name=$name $splitname

Then check that the sizes of the msplit_dups / nodups / optdups add up to the size of merged_sort. If so, concatenate the nodups into merged_nodups and then run the pipeline in “final” stage. ​

On Mon, Oct 2, 2017 at 3:34 PM, Sameet notifications@github.com wrote:

After one more false start, the pipeline seems to be working. I am currently at the dedup step. The file was huge. It was split into over 500 chunks. One of the chunks failed. Will that affect the final step. Will I have to run dedup again? Can I just remove that one .err file, and let it continue?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/theaidenlab/juicer/issues/32#issuecomment-333572022, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWiW8l_8aYkE_PgDSOAQ6wF1y3zrJH6ks5soQKjgaJpZM4PXtER .

-- Neva Cherniavsky Durand, Ph.D. Staff Scientist, Aiden Lab www.aidenlab.org

sameet commented 7 years ago

That may not be an option, because the error message was it ran out of time on the node, took more than 24 hours. I fear that if i run it again, the same is going to happen. Is there a work around. Following is the error message for that chunk in the debug directory:

slurmstepd: error: *** JOB 1713635 ON bigmem01 CANCELLED AT 2017-09-30T06:39:21 DUE TO TIME LIMIT ***

Is there a way around this?

nchernia commented 7 years ago

Do you have a longer queue to run in?

This probably means that the file has a lot of duplicates in it. You can relax the wobble requirement and just look for exact matches.

See attached file for an alternate script you can run on the never-finishing job; this doesn't have wobble and so the "nodups" file will include some probable PCR duplicates.

On Mon, Oct 2, 2017 at 5:00 PM, Sameet notifications@github.com wrote:

That may not be an option, because the error message was it ran out of time on the node, took more than 24 hours. I fear that if i run it again, the same is going to happen. Is there a work around. Following is the error message for that chunk in the debug directory:

slurmstepd: error: JOB 1713635 ON bigmem01 CANCELLED AT 2017-09-30T06:39:21 DUE TO TIME LIMIT

Is there a way around this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/theaidenlab/juicer/issues/32#issuecomment-333597037, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWiW-ekd6QrNmAGM5RGU2918F3QmLOUks5soRbGgaJpZM4PXtER .

-- Neva Cherniavsky Durand, Ph.D. Staff Scientist, Aiden Lab www.aidenlab.org

sameet commented 7 years ago

Hi, there is no attachment.

Sameet

nchernia commented 7 years ago

Script pasted below.

Usage:

awk -v name="test" -f dups_nowobble.awk

Reads infile, writes two files, "nodups" and "dups"

where the duplicates are stored in dups

BEGIN { dupname=name"dups.txt"; nodupname=name"merged_nodups.txt"; } {

if strand, chromosome, position match previous line it's a dup

if ($1!=p1 || $2 != p2 || $3 != p3 || $4 != p4 || $5 != p5 || $6 != p6 || $7 != p7 || $8 != p8){ print > nodupname } else { print > dupname } }

assign previous whether dup or nodup

{ p1=$1;p2=$2;p3=$3;p4=$4;p5=$5;p6=$6;p7=$7;p8=$8 }

On Mon, Oct 2, 2017 at 5:53 PM, Sameet notifications@github.com wrote:

Hi, there is no attachment.

Sameet

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/theaidenlab/juicer/issues/32#issuecomment-333612430, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWiW4XNKJb8BcX2nqQE0Jt-3Xt9FhI8ks5soSMmgaJpZM4PXtER .

-- Neva Cherniavsky Durand, Ph.D. Staff Scientist, Aiden Lab www.aidenlab.org

sameet commented 7 years ago

Most of the pipeline ran till the deduplication step. I got the following error in the hic step

 more hic-1692493.err
Problem with creating fragment-delimited maps, NullPointerException. 
This could be due to a null fragment map or to a mismatch in the chromosome name in the 
 fragment map v is-a-vis the input file or chrom.sizes file. 
Exiting.

Is there any way to fix this.

nchernia commented 7 years ago

This is fixed in the latest jar.

http://hicfiles.tc4ga.com.s3.amazonaws.com/public/juicer/juicer_tools.1.7.6_jcuda.0.8.jar

On Thu, Oct 26, 2017 at 10:01 AM, Sameet notifications@github.com wrote:

Most of the pipeline ran till the deduplication step. I got the following error in the hic step

more hic-1692493.err Problem with creating fragment-delimited maps, NullPointerException. This could be due to a null fragment map or to a mismatch in the chromosome name in the fragment map v is-a-vis the input file or chrom.sizes file. Exiting.

Is there any way to fix this.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/theaidenlab/juicer/issues/32#issuecomment-339675745, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWiW9wERXeGO1EfmxIGOlPK7sE82K0mks5swJCpgaJpZM4PXtER .

-- Neva Cherniavsky Durand, Ph.D. Staff Scientist, Aiden Lab www.aidenlab.org

sameet commented 7 years ago

I think the error was generated before the .hic files were generated. Everystep after dedup has failed. The dedup worked fine.

nchernia commented 7 years ago

The null pointer exception is a known bug that has been fixed. The step after dedup is hic file creation.

On Fri, Oct 27, 2017 at 2:23 PM Sameet notifications@github.com wrote:

I think the error was generated before the .hic files were generated. Everystep after dedup has failed. The dedup worked fine.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/theaidenlab/juicer/issues/32#issuecomment-340048070, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWiW1QzcL2ZdFc2TEAsDsgBJK4PEGqRks5swh-YgaJpZM4PXtER .

-- Neva Cherniavsky Durand, Ph.D. Staff Scientist, Aiden Lab www.aidenlab.org

Aannaw commented 2 years ago

I think the error was generated before the .hic files were generated. Everystep after dedup has failed. The dedup worked fine. Hello have you solved the problem and successfully run juicer with SLURM. I also want to run juicer with SLURM but I am not root user, and I can not apply module load to change $load_bwa, $load_java,$load_gnu. I am not sure if I can set the path of software like load_bwa="export PATH=/data/01/user157/software/bwa:$PATH" , load_java="export PATH=/usr/bin/:$PATH" and load gpu=load_java=" export PATH=/usr/bin/:$PATH" . I would appreciate with it if you could give me any suggestions! Best wishes!

sa501428 commented 2 years ago

If you are using a recent jar, these issues are all resolved. You should contact the SLURM cluster manager and ask how to load bwa/java/etc. Those commands can then be replaced to be specific to your instance. If you have further questions, see the forum.

Aannaw commented 2 years ago

@sa501428 Hello Professor In the end I chose to link the CPUdirectory to my work directory and then run with CPU. The command is "./scripts/juicer.sh -g Ma6.genome -d /data/01/user157/HIFI/Hic-anchor/Ma6/juicer/work/hic_data -s MboI -p ./restriction_sites/Ma6.genome.chrom.sizes -y ./restriction_sites/Ma6.genome_MboI.txt -z ./references/Ma6.bp.p_ctg.fasta -D /data/01/user157/HIFI/Hic-anchor/Ma6/juicer/work -t 80 -e early". However, it seems that thehic raw readshave not been split in thetmp split directory. Also I can not find the debug directory. The following picture is the content of split directory. I just check the process and It is running with chimera read handling. I am really confused what is chimera read handling after looking through chimeric_sam.awk . Also, I refer to the juicer.sh but I can not find the chunk size to split like slurm. I would appreciate it if you could give me any suggestions? Best wishes. juicer-split