KCCG / ClinSV

Robust detection of clinically relevant structural and copy number variation from whole genome sequencing data
Other
64 stars 8 forks source link

ClinSV v1.0 has issue with lumpy-sv #9

Closed IshakYusuf closed 2 years ago

IshakYusuf commented 3 years ago

Dear KCCG

I followed all the steps to run the software on the server but I get that the lumpy-sv does not work wiith clinsv v1.0

##############################################
####                ClinSV                ####
##############################################
# 06/08/2021 10:09:09

# clinsv dir: /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/clinsv
# projectDir: /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder
# sampleInfoFile: sampleinfo_mod.txt (read only)
# name stem: project_folder
# lumpyBatchSize: 5
# genome reference: /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/clinsv/refdata-b38
# run steps: all
# number input bams: 1

# Read Sample Info from sampleinfo_mod.txt
# use: FR05812606   H7LH3CCXX_6 /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/NA12878_b38.bam 
# 1 samples to process
# If not, please exit and modify sampleinfo_mod.txt. 

###### Generate the commands and scripts ######

# bigwig

# lumpy

# cnvnator

# annotate

# prioritize

# qc

###### Run jobs ######

 ### executing: sh /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/alignments/FR05812606/bw/sh/bigwig.createWigs.FR05812606.sh &> /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/alignments/FR05812606/bw/sh/bigwig.createWigs.FR05812606.e  ...  

 ### finished after (hh:mm:ss): 00:00:00
 ### exist status: 0
/home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/alignments/FR05812606/bw/sh/bigwig.createWigs.FR05812606.sh: 1: set: Illegal option -o pipefail
 ### executing: sh /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/alignments/FR05812606/bw/sh/bigwig.q0.FR05812606.sh &> /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/alignments/FR05812606/bw/sh/bigwig.q0.FR05812606.e  ...  

 ### finished after (hh:mm:ss): 00:00:00
 ### exist status: 0

 ### executing: sh /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/alignments/FR05812606/bw/sh/bigwig.q20.FR05812606.sh &> /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/alignments/FR05812606/bw/sh/bigwig.q20.FR05812606.e  ...  

/home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/alignments/FR05812606/bw/sh/bigwig.q0.FR05812606.sh: 1: set: Illegal option -o pipefail
 ### finished after (hh:mm:ss): 00:00:00
 ### exist status: 0

 ### executing: sh /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/alignments/FR05812606/bw/sh/bigwig.mq.FR05812606.sh &> /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/alignments/FR05812606/bw/sh/bigwig.mq.FR05812606.e  ...  

/home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/alignments/FR05812606/bw/sh/bigwig.q20.FR05812606.sh: 1: set: Illegal option -o pipefail
 ### finished after (hh:mm:ss): 00:00:00
 ### exist status: 0

 ### executing: sh /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/SVs/FR05812606/lumpy/sh/lumpy.preproc.FR05812606.sh &> /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/SVs/FR05812606/lumpy/sh/lumpy.preproc.FR05812606.e  ...  

/home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/alignments/FR05812606/bw/sh/bigwig.mq.FR05812606.sh: 1: set: Illegal option -o pipefail
 ### finished after (hh:mm:ss): 00:00:00
 ### exist status: 0

/home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/SVs/FR05812606/lumpy/sh/lumpy.preproc.FR05812606.sh: 1: set: Illegal option -o pipefail
 ### executing: sh /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/SVs/joined/lumpy/sh/lumpy.caller.joined.sh &> /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/SVs/joined/lumpy/sh/lumpy.caller.joined.e  ...  

 ### finished after (hh:mm:ss): 00:00:00
 ### exist status: 0

/home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/SVs/joined/lumpy/sh/lumpy.caller.joined.sh: 1: set: Illegal option -o pipefail
 ### executing: sh /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/SVs/joined/lumpy/sh/lumpy.depth.joined.sh &> /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/SVs/joined/lumpy/sh/lumpy.depth.joined.e  ...  

 ### finished after (hh:mm:ss): 00:00:00
 ### exist status: 0

 ### executing: sh /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/SVs/FR05812606/cnvnator/sh/cnvnator.caller.FR05812606.sh &> /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/SVs/FR05812606/cnvnator/sh/cnvnator.caller.FR05812606.e  ...  

/home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/SVs/joined/lumpy/sh/lumpy.depth.joined.sh: 1: set: Illegal option -o pipefail
 ### finished after (hh:mm:ss): 00:00:00
 ### exist status: 0

/home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/SVs/FR05812606/cnvnator/sh/cnvnator.caller.FR05812606.sh: 2: /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/SVs/FR05812606/cnvnator/sh/cnvnator.caller.FR05812606.sh: source: not found
/home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/SVs/FR05812606/cnvnator/sh/cnvnator.caller.FR05812606.sh: 3: set: Illegal option -o pipefail
 ### executing: sh /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/SVs/joined/sh/annotate.main.joined.sh &> /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/SVs/joined/sh/annotate.main.joined.e  ...  

 ### finished after (hh:mm:ss): 00:00:01
 ### exist status: 0

/home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/SVs/joined/sh/annotate.main.joined.sh: 1: set: Illegal option -o pipefail
 ### executing: sh /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/SVs/joined/sh/prioritize.main.joined.sh &> /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/SVs/joined/sh/prioritize.main.joined.e  ...  

 ### finished after (hh:mm:ss): 00:00:00
 ### exist status: 0

/home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/SVs/joined/sh/prioritize.main.joined.sh: 1: set: Illegal option -o pipefail
 ### executing: sh /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/SVs/qc/sh/qc.main.joined.sh &> /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/SVs/qc/sh/qc.main.joined.e  ...  

 ### finished after (hh:mm:ss): 00:00:00
 ### exist status: 0

# 06/08/2021 10:09:10 Project project_folder project_folder | Total jobs 11 | Remaining jobs 0 | Remaining steps bigwig,lumpy,cnvnator,annotate,prioritize,qc  11 | Total time: 0 min

# 06/08/2021 10:09:10 Project project_folder project_folder | Total jobs 11 | Remaining jobs 0 | Remaining steps   0 | Total time: 0 min

# Everything done! Exit

# writing igv session files...

After that I get this message

lumpy variants not present: /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/SVs/joined/SV-CNV.FR05812606.igv.bed.gz
/home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/project_folder/SVs/qc/sh/qc.main.joined.sh: 1: set: Illegal option -o pipefail
Use of uninitialized value $_[0] in substitution (s///) at /usr/share/perl/5.26/File/Basename.pm line 180.
fileparse(): need a valid pathname at /home/gnks/Desktop/Temp_yusuf_baran/clincv38_v2/clinsv/bin/clinsv line 1001. 

Any suggestion?

baranaldemir commented 3 years ago

I have the same problem. Seems like "SV-CNV.FR05812606.igv.bed.gz" file is missing. Any idea how to get this file?

MinocheAE commented 3 years ago

Hi,

the error message is: "set: Illegal option -o pipefail"

Googling the error message I found: https://stackoverflow.com/questions/54055549/linux-ubuntu-set-illegal-option-o-pipefail

So you are using Ubuntu, I guess. ClinSV 1.0 was developed under Centos 8 and it would require some debugging on you end to make it run on Ubuntu.

To start with you would need to update "set -e -x -o pipefail" in bin/clinsv to the equivalent on Ubuntu.

Alternatively, you could try using Centos 8 in a virtual environment.

IshakYusuf commented 3 years ago

Thanks for your response

weizhousjtu commented 3 years ago

Dear MinocheAE,

I tested the example bam file and it successed. However, I got this error when I test on my own data. My bam was aligned to hg19 and first I coverted 'chr1' to '1' and used the refdata-b37 from ClinSV. Then run my script: singularity run clinsv.sif \ -i "$input_path/CHD1_chr.bam" \ -ref $PWD/clinsv/refdata-b37 \ -p $output_path/project_folder I got the error: `

pairend_distro-a1.py: error: option -r: invalid integer value: '-X'

Program: ** (v 0.2.11) Author: Ryan Layer (rl6sf@virginia.edu) Summary: Find structural variations in various signals.

Usage: ** [OPTIONS]

Options: -g Genome file (defines chromosome order) -e Show evidence for each call -w File read windows size (default 1000000) -mw minimum weight for a call -msw minimum per-sample weight for a call -tt trim threshold -x exclude file bed file -t temp file prefix, must be to a writeable directory -P output probability curve for each variant -b output BEDPE instead of VCF

`

My system is Linux version 4.14.0-115.el7a.0.1.aarch64 (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC))

Any suggestions?

drmjc commented 3 years ago

From this error

region "1:1000000-1100000" specifies an unknown reference name. Continue anyway.

Suggests it can't find contig '1', so I expect your bam file isn't right. Check that the bam header and that the contig names throughout are 1,2,...,22, X, Y, MT.

On Fri, 20 Aug 2021, 5:56 pm WeiZhou, @.***> wrote:

Dear MinocheAE,

I tested the example bam file and it successed. However, I got this error when I test on my own data. My bam was aligned to hg19 and first I coverted 'chr1' to '1' and used the refdata-b37 from ClinSV. Then run my script: singularity run clinsv.sif \ -i "$input_path/CHD1_chr.bam" \ -ref $PWD/clinsv/refdata-b37 \ -p $output_path/project_folder I got the error: `

  • export PATH=/opt/clinsv/bin:/opt/clinsv/bin:/opt/clinsv/root/bin:/bin/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
  • PATH=/opt/clinsv/bin:/opt/clinsv/bin:/opt/clinsv/root/bin:/bin/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

  • declare -A meanArr
  • declare -A stdevArr
  • declare -A readLArr ++ samtools view /lustre/home/acct-clsqsy/clsqsy/Project/CHD_trios_Result/CNV_Results/CHD1/project_folder/alignments/CHD1/CHD1.bam 1:1000000-1100000 ++ cut -f 10 ++ awk '{ print length}' ++ awk '(NR==1){print}' ++ sort -rn [main_samview] region "1:1000000-1100000" specifies an unknown reference name. Continue anyway.
  • readLArr[CHDCHD1]=
  • read -r mean stdev ++ python /opt/clinsv/clinSV/scripts/pairend_distro-a1.py -r -X 2 -N 100000 -o /lustre/home/acct-clsqsy/clsqsy/Project/CHD_trios_Result/CNV_Results/CHD1/project_folder/SVs/CHD1/lumpy/CHDCHD1.pe.histo ++ samtools view -r CHDCHD1 /lustre/home/acct-clsqsy/clsqsy/Project/CHD_trios_Result/CNV_Results/CHD1/project_folder/alignments/CHD1/CHD1.bam ++ cut -d : -f 2 Usage: pairend_distro-a1.py [options]

pairend_distro-a1.py: error: option -r: invalid integer value: '-X'

  • meanArr[CHDCHD1]=
  • stdevArr[CHDCHD1]=
  • cd /lustre/home/acct-clsqsy/clsqsy/Project/CHD_trios_Result/CNV_Results/CHD1/project_folder/SVs/joined/lumpy ++ expr - 30 expr: syntax error
  • lumpy -mw 3 -tt 0 -x /lustre/home/acct-clsqsy/clsqsy/software/clinsv/clinsv/refdata-b37/lumpy_exclude.bed -pe id:CHD1,bam_file:/lustre/home/acct-clsqsy/clsqsy/Project/CHD_trios_Result/CNV_Results/CHD1/project_folder/SVs/CHD1/lumpy/CHD1.discordants.bam,read_group:CHDCHD1,histo_file:/lustre/home/acct-clsqsy/clsqsy/Project/CHD_trios_Result/CNV_Results/CHD1/project_folder/SVs/CHD1/lumpy/CHDCHD1.pe.histo,mean:,stdev:,read_length:,min_non_overlap:,discordant_z:3,back_distance:10,weight:1,min_mapping_threshold:20 -sr id:CHD1,bam_file:/lustre/home/acct-clsqsy/clsqsy/Project/CHD_trios_Result/CNV_Results/CHD1/project_folder/SVs/CHD1/lumpy/CHD1.splitters.f.bam,back_distance:10,weight:2,min_mapping_threshold:20 Parameter required for mean

Program: ** (v 0.2.11) Author: Ryan Layer @.***) Summary: Find structural variations in various signals.

Usage: ** [OPTIONS]

Options: -g Genome file (defines chromosome order) -e Show evidence for each call -w File read windows size (default 1000000) -mw minimum weight for a call -msw minimum per-sample weight for a call -tt trim threshold -x exclude file bed file -t temp file prefix, must be to a writeable directory -P output probability curve for each variant -b output BEDPE instead of VCF

`

My system is Linux version 4.14.0-115.el7a.0.1.aarch64 (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC))

Any suggestions?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/KCCG/ClinSV/issues/9#issuecomment-902508408, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEQQMZINRXJIXDHGA25LC3T5YDDLANCNFSM5BVLHA4A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

MinocheAE commented 3 years ago

Apart from what drmjc suggested, please also check if you re-indexed your bam file after converting.

weizhousjtu commented 3 years ago

Thank you for your suggestions. However, I still got almost the same error except "region "1:1000000-1100000" specifies an unknown reference name. Continue anyway." when I re-converted and re-indexed my bam file. The error is: `

pairend_distro-a1.py: error: option -r: invalid integer value: '-X'

Program: ** (v 0.2.11) Author: Ryan Layer (rl6sf@virginia.edu) Summary: Find structural variations in various signals.

Usage: ** [OPTIONS] `

I changed chromosome names in bam file from 'chr1' to '1' and reindexed the bam by using samtools and sed. However, I think my bam file still have problems.

Could you provide an example of processing a bam file that aligned to the hg19 to compatible the ClinSV?

MinocheAE commented 3 years ago

As long as this throws errors or gives no content back, the bam is not converted correctly.

samtools view your_sample.bam 1:1000000-1100000 | head

For bug fixing, you could try creating a small subset of your original bam file and convert that.

I am afraid I don't have a conversion script at hand, but there might be one on the internet.

Make sure that chromosome name are converted also for the SA:Z field as this is needed as well.

ClinSV will need the entire bam file for its execution. Make sure to delete temporary files and output files before rerunning ClinSV on the same directory, as ClinSV will try to continue where it left off, and when temporary files are wrong, it won't be able to continue.

I hope this helps

weizhousjtu commented 3 years ago

Dear MinocheAE, I got an error: `+ cd /lustre/home/acct-clsqsy/clsqsy/Project/CHD_trios_Result/CNV_Results/CHD10/project_folder/SVs/CHD10/cnvnator/

Before run the clinsv, I converted my bam file from 'chr1' to '1', just like this: `1 249250621 60072142 72544 2 243199373 63921844 89240 3 198022430 48674120 55739 4 191154276 50273758 59757 5 180915260 44596756 50925 6 171115067 41677631 47610 7 159138663 40403030 49898 8 146364022 37645021 45962 9 141213431 30732246 37560 10 135534747 39060015 50660 11 135006516 33827396 41880 12 133851895 32695918 39271 13 115169878 23725039 26747 14 107349540 22124937 26473 15 102531392 20684795 25463 16 90354753 22530146 30911 17 81195210 20760500 28697 18 78077248 20018521 23158 19 59128983 15030201 22258 20 63025520 15282867 21514 21 48129895 9855026 15869 22 51304566 8810179 13628 X 155270560 38449067 47883 Y 59373566 2386239 12049 1_gl000191_random 106433 16776 23 1_gl000192_random 547496 132141 182 4_ctg9_hap1 590426 66454 55 4_gl000193_random 189789 174752 181 4_gl000194_random 191469 78337 142 6_apd_hap1 4622290 69203 99 6_cox_hap2 4795371 187442 233 6_dbb_hap3 4610396 159405 212 6_mann_hap4 4683263 162130 257 6_mcf_hap5 4833398 158678 215 6_qbl_hap6 4611984 154139 237 6_ssto_hap7 4928567 135572 221 7_gl000195_random 182896 174833 165 8_gl000196_random 38914 5449 5 8_gl000197_random 37175 5029 8 9_gl000198_random 90085 68029 136 9_gl000199_random 169874 1833672 1231 9_gl000200_random 187035 18216 27 9_gl000201_random 36148 4579 12 11_gl000202_random 40103 8174 16 17_ctg5_hap1 1680828 112021 198 17_gl000203_random 37498 13602 14 17_gl000204_random 81310 16448 99 17_gl000205_random 174588 150946 182 17_gl000206_random 41001 5914 11 18_gl000207_random 4262 1949 29 19_gl000208_random 92689 189828 204 19_gl000209_random 159169 24092 33 21_gl000210_random 27682 2996 13

drmjc commented 2 years ago

This all sounds like the BAM files are still incorrectly formatted, or corrupted during your conversion process.

So I think you have three options:

  1. realign your reads to hs37d5 and use ClinSV v0.9
  2. realign your reads to GRCh38 (I think all use 'chr1' style names) and use ClinSV v1.0
  3. wait until we upgrade ClinSV to support hg19 (ie allowing contigs named 'chr1' in the bam). This is in progress, but we're starting on dockerising ClinSV to make option 2 easier for everyone.

I'll close this issue

nehasanghi commented 2 years ago

Dear Sir Wait until we upgrade ClinSV to support hg19 (ie allowing contigs named 'chr1' in the bam). This is in progress, but we're starting on dockerising ClinSV to make option 2 easier for everyone.

Is there any update on this above mentioned point. If yes then I would like to know more about it as I was planning to use this software for my analysis. I have all my BAM files mapped to GRCH37 and it does not match with the clinsv required bam file.

Hoping for a a positive response Thank you

drmjc commented 2 years ago

Hi, we are currently working on this feature see #27 and #28 for progress updates. thanks for your patience

nehasanghi commented 2 years ago

Thank you so much for replying. Hoping to use this software soon.

Please keep us posted

Neha Sanghi

Research Associate

Sir Ganga Ram Hospital

Rajinder Nagar, New Delhi-110060

On Fri, Jun 10, 2022 at 7:14 AM Mark Cowley @.***> wrote:

Hi, we are currently working on this feature see #27 https://github.com/KCCG/ClinSV/issues/27 and #28 https://github.com/KCCG/ClinSV/issues/28 for progress updates. thanks for your patience

— Reply to this email directly, view it on GitHub https://github.com/KCCG/ClinSV/issues/9#issuecomment-1151818801, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAU5DMB2ZTAVNMWKMIW4FCDVOKMYBANCNFSM5BVLHA4A . You are receiving this because you commented.Message ID: @.***>