aidenlab / juicer

A One-Click System for Analyzing Loop-Resolution Hi-C Experiments
http://aidenlab.org
MIT License
402 stars 180 forks source link

syntax errors in `chimeric_sam.awk`, cannot resume using `-S chimeric` #261

Closed agilly closed 2 years ago

agilly commented 2 years ago

Juicer was installed according to the instructions:

├── aligned
├── L79851_Track-123454
│   ├── aligned
│   │   └── header
│   ├── fastq
│   │   ├── L79851_Track-123454_R1.fastq.gz -> /lustre/groups/itg/teams/zeggini/projects/child_diabesity/data/HiC/test_sample/L79851_Track-123454_R1.fastq.gz
│   │   └── L79851_Track-123454_R2.fastq.gz -> /lustre/groups/itg/teams/zeggini/projects/child_diabesity/data/HiC/test_sample/L79851_Track-123454_R2.fastq.gz
│   └── splits
│       ├── L79851_Track-123454.fastq.gz.bam
│       ├── L79851_Track-123454.fastq.gz_linecount.txt
│       ├── L79851_Track-123454.fastq.gz_norm.txt.res.txt
│       ├── L79851_Track-123454.fastq.gz.sam
│       ├── L79851_Track-123454.fastq.gz.sam2
│       ├── L79851_Track-123454_R1.fastq.gz -> /lustre/groups/itg/teams/zeggini/projects/child_diabesity/analysis/initial_sample/qc/Juicer/L79851_Track-123454//fastq/L79851_Track-123454_R1.fastq.gz
│       └── L79851_Track-123454_R2.fastq.gz -> /lustre/groups/itg/teams/zeggini/projects/child_diabesity/analysis/initial_sample/qc/Juicer/L79851_Track-123454//fastq/L79851_Track-123454_R2.fastq.gz
├── opt
│   ├── juicer
│   │   ├── AWS
│   │   │   ├── README.md
│   │   │   └── scripts
│   │   │       ├── check.sh
│   │   │       ├── chimeric_blacklist.awk
│   │   │       ├── cleanup.sh
│   │   │       ├── collisions.awk
│   │   │       ├── countligations.sh
│   │   │       ├── diploid.pl
│   │   │       ├── diploid_split.awk
│   │   │       ├── dups.awk
│   │   │       ├── fragment_4dnpairs.pl
│   │   │       ├── fragment.pl
│   │   │       ├── juicer_arrowhead.sh                                                                                                                                                       [178/9865]
│   │   │       ├── juicer_hiccups.sh
│   │   │       ├── juicer_postprocessing.sh
│   │   │       ├── juicer.sh
│   │   │       ├── juicer_tools
│   │   │       ├── lib64
│   │   │       │   ├── libJCudaDriver-linux-x86_64.so
│   │   │       │   └── libJCudaRuntime-linux-x86_64.so
│   │   │       ├── LibraryComplexity.class
│   │   │       ├── LibraryComplexity.java
│   │   │       ├── makemega_addstats.awk
│   │   │       ├── mega.sh
│   │   │       ├── relaunch_prep.sh
│   │   │       ├── split_rmdups.awk
│   │   │       ├── statistics.pl
│   │   │       └── stats_sub.awk
│   │   ├── CODE_OF_CONDUCT.md
│   │   ├── CONTRIBUTING.md
│   │   ├── CPU
│   │   │   ├── common
│   │   │   │   ├── adjust_insert_size.awk
│   │   │   │   ├── check.sh
│   │   │   │   ├── chimeric_sam.awk
│   │   │   │   ├── cleanup.sh
│   │   │   │   ├── conversion.sh
│   │   │   │   ├── countligations.sh
│   │   │   │   ├── diploid.pl
│   │   │   │   ├── diploid.sh
│   │   │   │   ├── diploid_split.awk
│   │   │   │   ├── dups_sam.awk
│   │   │   │   ├── fragment_4dnpairs.pl
│   │   │   │   ├── index_by_chr.awk                                                                                                                                                          [147/9865]
│   │   │   │   ├── juicer_arrowhead.sh
│   │   │   │   ├── juicer_hiccups.sh
│   │   │   │   ├── juicer_postprocessing.sh
│   │   │   │   ├── juicer_tools
│   │   │   │   ├── juicer_tools.1.9.9_jcuda.0.8.jar
│   │   │   │   ├── juicer_tools.jar -> juicer_tools.1.9.9_jcuda.0.8.jar
│   │   │   │   ├── merge-stats.jar
│   │   │   │   ├── relaunch_prep.sh
│   │   │   │   ├── sam_to_mnd.sh
│   │   │   │   ├── sam_to_pre.awk
│   │   │   │   └── stats_sub.awk
│   │   │   ├── juicer.sh
│   │   │   ├── mega_from_bams_diploid.sh
│   │   │   ├── mega_from_bams.sh
│   │   │   ├── mega.sh
│   │   │   └── README.md
│   │   ├── LICENSE
│   │   ├── LSF
│   │   │   ├── README.md
│   │   │   └── scripts
│   │   │       ├── check.sh
│   │   │       ├── chimeric_blacklist.awk
│   │   │       ├── cleanup.sh
│   │   │       ├── collisions.awk
│   │   │       ├── countligations.sh
│   │   │       ├── dups.awk
│   │   │       ├── fragment_4dnpairs.pl
│   │   │       ├── fragment.pl
│   │   │       ├── juicer_arrowhead.sh
│   │   │       ├── juicer_hiccups.sh
│   │   │       ├── juicer_postprocessing.sh                                                                                                                                                  [116/9865]
│   │   │       ├── juicer.sh
│   │   │       ├── juicer_tools
│   │   │       ├── lib64
│   │   │       │   ├── libJCudaDriver-linux-x86_64.so
│   │   │       │   └── libJCudaRuntime-linux-x86_64.so
│   │   │       ├── LibraryComplexity.class
│   │   │       ├── LibraryComplexity.java
│   │   │       ├── makemega_addstats.awk
│   │   │       ├── mega.sh
│   │   │       ├── relaunch_prep.sh
│   │   │       ├── split_rmdups.awk
│   │   │       ├── statistics.pl
│   │   │       └── stats_sub.awk
│   │   ├── misc
│   │   │   ├── calculate_map_resolution.sh
│   │   │   └── generate_site_positions.py
│   │   ├── PBS
│   │   │   ├── README
│   │   │   └── scripts
│   │   │       ├── adjust_insert_size.awk
│   │   │       ├── check.sh
│   │   │       ├── chimeric_sam.awk
│   │   │       ├── cleanup.sh
│   │   │       ├── collisions.awk
│   │   │       ├── countligations.sh
│   │   │       ├── dups_sam.awk
│   │   │       ├── fragment_4dnpairs.pl
│   │   │       ├── juicer_arrowhead.sh
│   │   │       ├── juicer_hiccups.sh
│   │   │       ├── juicer_postprocessing.sh
│   │   │       ├── juicer.sh
│   │   │       ├── juicer_tools
│   │   │       ├── launch_stats.sh
│   │   │       ├── lib64
│   │   │       │   ├── libJCudaDriver-linux-x86_64.so
│   │   │       │   └── libJCudaRuntime-linux-x86_64.so
│   │   │       ├── postprocessing.sh
│   │   │       ├── relaunch_prep.sh
│   │   │       ├── split_rmdups_sam.awk
│   │   │       ├── statistics.pl
│   │   │       └── stats_sub.awk
│   │   ├── PBS_without_launch
│   │   │   ├── README
│   │   │   └── scripts
│   │   │       ├── adjust_insert_size.awk
│   │   │       ├── check.sh
│   │   │       ├── chimeric_sam.awk
│   │   │       ├── cleanup.sh
│   │   │       ├── collisions.awk
│   │   │       ├── countligations.sh
│   │   │       ├── dups_sam.awk
│   │   │       ├── fragment_4dnpairs.pl
│   │   │       ├── fragment.pl
│   │   │       ├── juicer_arrowhead.sh
│   │   │       ├── juicer_hiccups.sh
│   │   │       ├── juicer_mergepart.sh
│   │   │       ├── juicer_postprocessing.sh
│   │   │       ├── juicer.sh
│   │   │       ├── juicer_tools
│   │   │       ├── launch_stats.sh
│   │   │       ├── lib64                                                                                                                                                                      
│   │   │       │   ├── libJCudaDriver-linux-x86_64.so
│   │   │       │   └── libJCudaRuntime-linux-x86_64.so
│   │   │       ├── makemega_addstats.awk
│   │   │       ├── mega.sh
│   │   │       ├── postprocessing.sh
│   │   │       ├── relaunch_prep.sh
│   │   │       ├── split_rmdups_sam.awk
│   │   │       ├── statistics.pl
│   │   │       └── stats_sub.awk
│   │   ├── README.md
│   │   ├── SLURM
│   │   │   ├── README.md
│   │   │   └── scripts
│   │   │       ├── adjust_insert_size.awk
│   │   │       ├── check.sh
│   │   │       ├── chimeric_sam.awk
│   │   │       ├── cleanup.sh
│   │   │       ├── conversion.sh
│   │   │       ├── countligations.sh
│   │   │       ├── diploid.pl
│   │   │       ├── diploid_split.awk
│   │   │       ├── dups_sam.awk
│   │   │       ├── fragment_4dnpairs.pl
│   │   │       ├── GSE63525_GM12878_primary_replicate_HiCCUPS_looplist_with_motifs_unique_localized.txt
│   │   │       ├── index_by_chr.awk
│   │   │       ├── juicer_arrowhead.sh
│   │   │       ├── juicer_hiccups.sh
│   │   │       ├── juicer_postprocessing.sh
│   │   │       ├── juicer.sh
│   │   │       ├── juicer_tools
│   │   │       ├── lib64
│   │   │       │   ├── libJCudaCommonJNI.a
│   │   │       │   ├── libJCudaDriver-linux-ppc_64.so
│   │   │       │   └── libJCudaRuntime-linux-ppc_64.so
│   │   │       ├── makemega_addstats.awk
│   │   │       ├── mega.sh
│   │   │       ├── relaunch_prep.sh
│   │   │       ├── sam_to_mnd.sh
│   │   │       ├── sam_to_pre.awk
│   │   │       ├── split_rmdups_sam.awk
│   │   │       └── stats_sub.awk
│   │   └── UGER
│   │       ├── README.md
│   │       └── scripts
│   │           ├── check.sh
│   │           ├── chimeric_blacklist.awk
│   │           ├── cleanup.sh
│   │           ├── collisions.awk
│   │           ├── collisions_dedup_rearrange_cols.awk
│   │           ├── collisions_dups.awk
│   │           ├── countligations.sh
│   │           ├── diploid.pl
│   │           ├── diploid.sh
│   │           ├── diploid_split.awk
│   │           ├── dups.awk
│   │           ├── fragment_4dnpairs.pl
│   │           ├── fragment.pl
│   │           ├── juicer_arrowhead.sh
│   │           ├── juicer_hiccups.sh
│   │           ├── juicer_postprocessing.sh
│   │           ├── juicer.sh
│   │           ├── juicer_tools
│   │           ├── makemega_addstats.awk
│   │           ├── mega.sh
│   │           ├── relaunch_dups.sh
│   │           ├── relaunch_prep.sh
│   │           ├── split_rmdups.awk
│   │           ├── statistics.pl
│   │           ├── stats_sub.awk
│   │           └── vcftotxt.awk
│   ├── references
│   │   ├── hg38.p13.fa.gz 
│   │   ├── hg38.p13.fa.gz.amb 
│   │   ├── hg38.p13.fa.gz.ann 
│   │   ├── hg38.p13.fa.gz.bwt 
│   │   ├── hg38.p13.fa.gz.pac 
│   │   └── hg38.p13.fa.gz.sa
│   ├── restriction_sites
│   └── scripts -> $workpath/Juicer/opt/juicer/CPU
└── splits

Juicer was run as follows:

cd Juicer
./opt/scripts/juicer.sh -D $(readlink -f opt) -z $(pwd)/opt/references/hg38.p13.fa.gz -d $(pwd)/L79851_Track-123454/ -p hg38

This was run on a single host (no HPC) in a singularity container which has BWA in the path.

It gives the following error after successfully aligning:

(-:  Align of /lustre/groups/itg/teams/zeggini/projects/child_diabesity/analysis/initial_sample/qc/Juicer/L79851_Track-123454//splits/L79851_Track-123454.fastq.gz.sam done successfully
awk: Juicer/opt/scripts/common/chimeric_sam.awk: line 50: illegal reference to local variable array
awk: Juicer/opt/scripts/common/chimeric_sam.awk: line 51: illegal reference to local variable array
awk: Juicer/opt/scripts/common/chimeric_sam.awk: line 164: syntax error at or near [
awk: Juicer/opt/scripts/common/chimeric_sam.awk: line 213: illegal reference to array readname
awk: Juicer/opt/scripts/common/chimeric_sam.awk: line 697: illegal reference to array readname
awk: Juicer/opt/scripts/common/adjust_insert_size.awk: line 148: illegal reference to array prev
samtools sort: failed to read header from "-"
***! Failure during chimera handling of /Juicer/L79851_Track-123454//splits/L79851_Track-123454.fastq.gz

Any pointers as to what might be going wrong?

sa501428 commented 2 years ago

I think this may be a gawk vs awk issue. What version of awk is installed?

sa501428 commented 2 years ago

Hi @agilly ! Any luck with this issue?

agilly commented 2 years ago

Thanks for your pointer @sa501428 ! The version of awk which was installed was mawk 1.3.4 20200120. We have now upgraded to GNU Awk 5.1.1, API: 3.1 in our container. However, I am having trouble resuming Juicer. Since alignment finished successfully, I thought -S chimeric would work. However, I get an error:

$> juicer.sh -S chimeric ... # -D, -z, -d, and -p are set
***! Move or remove directory "/lustre/groups/itg/teams/zeggini/projects/child_diabesity/analysis/initial_sample/qc/Juicer/L79851_Track-123454//aligned" before proceeding.

If I remove the aligned directory, won't that restart alignment? As that will take a very long time, I'd rather find a way to resume.

sa501428 commented 2 years ago

Can you run ls -l in the aligned folder? Technically the alignments should be saved in the splits folder. You should see sam files / other intermediate files there. The aligned folder is really the final files folder. Eventually the merged_nodups bam will be there, but that likely wasn't built yet when the jobs failed. So the aligned folder is probably empty or has incomplete files in it / there should be no issue deleting it. Let me know if it seems different.

agilly commented 2 years ago

All right, the aligned folder only contained a single file called header. When I restarted with -S upon deletion, Juicer started, but it gave the following message:

(-: Looking for fastq files...fastq files exist
---  Using already created files in .splits

(-: Aligning files matching fastq/*_R*.fastq*
 to genome hg38.p13.fa.gz with no fragment delimited maps.

It is currently running. I just wanted to check that this message doesn't mean that Juicer is restarting the alignment?

agilly commented 2 years ago

The splits folder contains:

-rw-rw----  2   Jan 24  10:45   L79851_Track-123454.fastq.gz_norm.txt.res.txt   
-rw-rw----  11  Jan 24  10:45   L79851_Track-123454.fastq.gz_linecount.txt  
-rw-rw----  0   Jan 24  13:20   L79851_Track-123454.fastq.gz.sam2   
-rw-rw----  652G    Jan 24  13:20   L79851_Track-123454.fastq.gz.sam    
-rw-rw----  0   Jan 24  13:20   L79851_Track-123454.fastq.gz.bam    

These files are a couple of days old which means they haven't been deleted or overwritten. I am not sure why the bam is of size 0 though.

agilly commented 2 years ago

After a long time, it displays the following:

Using already aligned reads L79851_Track-123454.fastq.gz.sam

so I guess we are fine!

xueqinerer commented 1 week ago

Hi, Have you solved the problem of errors? I have encountered this problem now, and I would like to ask if there is a solution.Thank you very much. awk: chimeric_sam.awk: line 50: illegal reference to local variable array awk: chimeric_sam.awk: line 51: illegal reference to local variable array awk: chimeric_sam.awk: line 164: syntax error at or near [