Closed nmmsv closed 5 years ago
Hi Nima,
The format of that bed file looks fine. The last column, the number of repeat units, accepts floating point values, so rounding them off like that may make a small difference to the size estimates. But they'll only be off by less than a repeat unit, so nothing worth worrying about.
So just to confirm, you're using the STRetch_wgs_bam_pipeline.groovy
pipeline?
The most common cause of the issue you're describing is that the input bam file was aligned to a different reference genome to the one you provide as input_regions
. For example if your bam was aligned with hg38 but your bed file describes hg19, or more commonly both are on the same genome build but one has chromosomes named 'chr1', 'chr2' and the other uses '1', '2' etc.
Note that the bed files specified for input_regions
and STR_BED
are not necessarily the same.
Also, have you looked through the custom reference data tutorial? https://github.com/Oshlack/STRetch/wiki/Reference-Data In particular, have you confirmed that there are no duplicate loci, and that the file is correctly sorted? i.e. these two commands:
# Remove duplicate lines (lines where two different repeat units are annotated at the same locus):
Rscript STRetch/scripts/dedupSTRannotation.R -i hg38.trf_period1-6.bed -o hg38.trf_period1-6.dedup.bed
# Sort:
bedtools sort -i hg38.trf_period1-6.dedup.bed > hg38.trf_period1-6.dedup.sorted.bed
If those suggestions don't help and you'd like me to look into this further, please provide the full STRetch command and the part of the header from the input bam file that specifies the chromosome names.
Warm regards, Harriet
Hi Harriet, Thanks for detailed response. I deduped and sorted my reference, and I still get the same error. Here is some more information. Thank you for your help!
Command:
/storage/nmmsv/STRetch/tools/bin/bpipe run -p input_regions=/storage/nmmsv/analysis/GangSTR-analyses/genome_wide/STRetch/refs/gangstr_exp8_for_STRetch.dedup.sorted.bed /storage/nmmsv/STRetch/pipelines/STRetch_wg\ s_bam_pipeline.groovy /storage/nmmsv/plat_genome/NA12878_S1.bam
Bam header:
@SQ SN:chrM LN:16571 @SQ SN:chr1 LN:249250621 @SQ SN:chr2 LN:243199373 @SQ SN:chr3 LN:198022430 @SQ SN:chr4 LN:191154276 @SQ SN:chr5 LN:180915260 @SQ SN:chr6 LN:171115067 @SQ SN:chr7 LN:159138663 @SQ SN:chr8 LN:146364022 @SQ SN:chr9 LN:141213431 @SQ SN:chr10 LN:135534747 @SQ SN:chr11 LN:135006516 @SQ SN:chr12 LN:133851895 @SQ SN:chr13 LN:115169878 @SQ SN:chr14 LN:107349540 @SQ SN:chr15 LN:102531392 @SQ SN:chr16 LN:90354753 @SQ SN:chr17 LN:81195210 @SQ SN:chr18 LN:78077248 @SQ SN:chr19 LN:59128983 @SQ SN:chr20 LN:63025520 @SQ SN:chr21 LN:48129895 @SQ SN:chr22 LN:51304566 @SQ SN:chrX LN:155270560 @SQ SN:chrY LN:59373566 @RG ID:NA12878 SM:NA12878 @PG ID:bwa PN:bwa VN:0.6.1-r104-tpx
and here is part of the pipeline config file:
// For exome pipeline only Edit before running the exome pipeline EXOME_TARGET="/storage/nmmsv/analysis/GangSTR-analyses/new_sim/results/STRetch/loci/HTT.bed" // Uncomment the line below to run the STRetch installation test, or specify your own //EXOME_TARGET="SCA8_region.bed"
// For bam pipeline only Edit before running if using CRAM input format CRAM_REF="path/to/reference_genome_used_to_create_cram.fasta" // Path to reference data //refdir="/storage/nmmsv/STRetch/reference-data" refdir="/storage/nmmsv/analysis/GangSTR-analyses/genome_wide/STRetch/refs"
// Decoy reference assumed to have matching .genome file in the same directory REF="/storage/nmmsv/STRetch/reference-data/hg19.STRdecoys.sorted.fasta" //STR_BED="/storage/nmmsv/STRetch/reference-data/hg19.simpleRepeat_period1-6_dedup.sorted.bed" STR_BED="/storage/nmmsv/analysis/GangSTR-analyses/genome_wide/STRetch/refs/gangstr_exp8_for_STRetch.dedup.sorted.bed" DECOY_BED="/storage/nmmsv/STRetch/reference-data/STRdecoys.sorted.bed" // By default, uses other samples in the same batch as a control CONTROL="" // Uncomment the line below to use a set of WGS samples as controls, or specify your own CONTROL="/storage/nmmsv/STRetch/reference-data/hg19.PCRfreeWGS_143_STRetch_controls.tsv"
Thanks!
Hi Nima,
That is a puzzle. Is the bed file tab-delimited? If not, try with tabs.
And is everything working when you keep everything the same but use the STRetch-provided bed file? I just want to eliminate the possibility that there's something else going on.
Would you be willing to send me the bed file so I can test it out?
Warm regards, Harriet
I just tried setting STR_BED variable in pipeline config and input_regions in runtime commands back to the originial STRetch reference, and I got the same error. I'm running in the directory that already has the files from partial run using the other reference. Do I need to rerun from scratch and do alignment step all over to see if it still works with the original ref?
I made sure that my reference is tab delimited, and I've uploaded it here: https://drive.google.com/open?id=1LvBqENdk0d4ebiiS1S3CLCxdC9t7frQJ
Thanks for your help! Nima
Hi Nima,
Yes, you would need to delete intermediate files, as changing input_regions
will change the contents of the intermediate bam file even if no error occurs at that point.
I ran through an exome sample with your bed file linked above, and it worked fine (correctly identified a known expansion). There are many fewer results as it lacks homopolymers, but that's not an issue if you're not interested in those.
$STRETCH/tools/bin/bpipe run -n 100 \
-p EXOME_TARGET="target_region.flattened.bed" \
-p input_regions=$STRETCH/reference-data/STRetch.bed \
-p STR_BED=$STRETCH/reference-data/STRetch.bed \
$STRETCH/pipelines/STRetch_exome_bam_pipeline.groovy \
sample_SCA8.merge.bam
Warm regards, Harriet
Hi Harriet, Thanks for the help. I'll try running the wgs pipeline one more time with the correct reference. I'll let you know if I ran into a problem! Thanks. Nima
Hello, I just got a new error in the alignment step. Is this something familiar to you?
[M::process] 0 single-end sequences; 346 paired-end sequences [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (8, 1, 0, 3) [M::mem_pestat] skip orientation FF as there are not enough pairs [M::mem_pestat] skip orientation FR as there are not enough pairs [M::mem_pestat] skip orientation RF as there are not enough pairs [M::mem_pestat] skip orientation RR as there are not enough pairs [M::mem_process_seqs] Processed 346 reads in 0.971 CPU sec, 0.419 real sec [main] Version: 0.7.15-r1140 [main] CMD: /storage/nmmsv/STRetch/tools/bin/bwa mem -p -M -t 7 -R @RG\tID:NA12878_S1\tPL:illumina\tPU:NA\tLB:NA\tSM:NA12878_S1 /storage/nmmsv/STRetch/reference-data/hg19.STRdecoys.sorted.fasta - [main] Real time: 47023.157 sec; CPU: 314309.855 sec [bam_sort_core] merging from 69 files and 1 in-memory blocks... bash: line 1: 114166 Killed java -Xmx16g -Dsamjdk.reference_fasta=path/to/reference_genome_used_to_create_cram.fasta -jar /storage/nmmsv/STRetch/tools/bin/bazam.jar -pad 5 -n 6 -L /storage/nmmsv/analysis/GangSTR-analyses/genome_wide/STRetch/refs/gangstr_exp8_for_STRetch.dedup.sorted.bed -bam /storage/nmmsv/plat_genome/NA12878_S1.bam 114167 Done | /storage/nmmsv/STRetch/tools/bin/bwa mem -p -M -t 7 -R "@RG\tID:NA12878_S1\tPL:illumina\tPU:NA\tLB:NA\tSM:NA12878_S1" /storage/nmmsv/STRetch/reference-data/hg19.STRdecoys.sorted.fasta - 114168 Done | /storage/nmmsv/STRetch/tools/bin/samtools view -bSuh - 114169 Done | /storage/nmmsv/STRetch/tools/bin/samtools sort -o NA12878_S1.STRdecoy.bam -T NA12878_S1.STRdecoy Cleaned up file NA12878_S1.STRdecoy.bam to .bpipe/trash/NA12878_S1.STRdecoy.bam.2 ERROR: stage align_bwa_bam failed: Command in stage align_bwa_bam failed with exit status = 137 :
set -o pipefail; java -Xmx16g -Dsamjdk.reference_fasta=path/to/reference_genome_used_to_create_cram.fasta -jar /storage/nmmsv/STRetch/tools/bin/bazam.jar -pad 5 -n 6 -L /storage/nmmsv/analysis/GangSTR-analyses/genome_wide/STRetch/refs/gangstr_exp8_for_STRetch.dedup.sorted.bed -bam /storage/nmmsv/plat_genome/NA12878_S1.bam | /storage/nmmsv/STRetch/tools/bin/bwa mem -p -M -t 7 -R "@RG\tID:NA12878_S1\tPL:illumina\tPU:NA\tLB:NA\tSM:NA12878_S1" /storage/nmmsv/STRetch/reference-data/hg19.STRdecoys.sorted.fasta - | /storage/nmmsv/STRetch/tools/bin/samtools view -bSuh - | /storage/nmmsv/STRetch/tools/bin/samtools sort -o NA12878_S1.STRdecoy.bam -T NA12878_S1.STRdecoy
========================================= Pipeline Failed ==========================================
One or more parallel stages aborted. The following messages were reported:
-------------------------------------- align_bwa_bam ( 1 ) ---------------------------------------
Command in stage align_bwa_bam failed with exit status = 137 :
set -o pipefail; java -Xmx16g -Dsamjdk.reference_fasta=path/to/reference_genome_used_to_create_cram.fasta -jar /storage/nmmsv/STRetch/tools/bin/bazam.jar -pad 5 -n 6 -L /storage/nmmsv/analysis/GangSTR-analyses/genome_wide/STRetch/refs/gangstr_exp8_for_STRetch.dedup.sorted.bed -bam /storage/nmmsv/plat_genome/NA12878_S1.bam | /storage/nmmsv/STRetch/tools/bin/bwa mem -p -M -t 7 -R "@RG\tID:NA12878_S1\tPL:illumina\tPU:NA\tLB:NA\tSM:NA12878_S1" /storage/nmmsv/STRetch/reference-data/hg19.STRdecoys.sorted.fasta - | /storage/nmmsv/STRetch/tools/bin/samtools view -bSuh - | /storage/nmmsv/STRetch/tools/bin/samtools sort -o NA12878_S1.STRdecoy.bam -T NA12878_S1.STRdecoy
Thanks! Nima
Hi Nima,
exit status = 137 may indicate that your cluster killed the process. Check memory requirements and walltime, as these are common reasons. There should be some way for your cluster to tell you what the error was. You'd have to talk to your IT as it's specific to the system.
Seems like this has morphed away from the original issue, so I'm going to close this for now. Let me know if there's anything else I can do to help.
Warm regards, Harriet
Hi, I'm trying to use a different reference set (STR bed file) with STRetch, and I'm having some trouble. I've converted my ref to the format that I believe should be appropriate:
Then I update the pipeline config such that STR_BED points to this reference. When I'm running bpipe, I also set input_regions=path/to/new/ref.bed, because I wasn't sure which parameter should be set to use a different STR reference set.
With these setting, the pipeline fails with this error: ERROR: file /PATH/NA12878_S1.STRdecoy.locus_counts contained 0 loci.
I checked STRdecoy.STR_counts file, and I can see non-zero values. (So I imagine alignment should be fine)
Thank you for your help! Nima