TheJacksonLaboratory / SVE

GNU General Public License v3.0
51 stars 12 forks source link

FusorSV not producing any output on a single sample #1

Closed MaestSi closed 5 years ago

MaestSi commented 6 years ago

Dear SVE developers, I tried running the whole SVE pipeline on a single sample, NA12878. I was able setting up all the softwares except cn.mops, Hydra and Breaskseq2. However, most importantly, FusorSV is not working properly. In particular, I created a folder named vcf_files/start_sorted were I put all the 4 VCF files I was able to generate. However, when launching FusorSV with command:

SVE_home=/home/simone/software/SVE FusorSV=$SVE_home/scripts/FusorSV/FusorSV.py PYTHON=/home/simone/software/miniconda2/bin/python reference_genome=/home/simone/home_disk/Whole_genome/Homo_sapiens_assembly38.fasta bam_file=$working_dir/start_sorted.bam

$PYTHON $FusorSV -f $SVE_home/scripts/FusorSV/data/models/default.pickle -L DEFAULT -r $reference_genome -i /home/simone/vcf_files/ -p 24 -o /home/simone/NA12878/FusorSV

the software starts running and produces output folders with proper structure, but VCF file in /home/simone/NA12878/FusorSV/vcf is empty apart from header.

What could be the problem here? I saw there is also a standalone FusorSV version (https://github.com/timothyjamesbecker/FusorSV) with slightly different dependencies, which could be a bit different from the version included here. Although it seems that version is a bit older, should I give it a try, or are there any attempts I could do before with more success probability? I am a bit confused about -i option: I followed all the instructions, but I am still not convinced all VCF files are properly read. Thanks in advance.

wanpinglee commented 6 years ago

Hi there,

As we know, some callers are not stable for HG38, such as hydra, so FusorSV was original designed for HG19 usage. https://github.com/timothyjamesbecker/FusorSV had been merged inside SVE and we don't maintain https://github.com/timothyjamesbecker/FusorSV anymore. The one inside SVE is up-to-date.

Cheers,

Wan-Ping


From: MaestSi notifications@github.com Sent: Tuesday, April 17, 2018 3:07 AM To: TheJacksonLaboratory/SVE Cc: Subscribed Subject: [TheJacksonLaboratory/SVE] FusorSV not producing any output on a single sample (#1)

Dear SVE developers, I tried running the whole SVE pipeline on a single sample, NA12878. I was able setting up all the softwares except cn.mops, Hydra and Breaskseq2. However, most importantly, FusorSV is not working properly. In particular, I created a folder named vcf_files/start_sorted were I put all the 4 VCF files I was able to generate. However, when launching FusorSV with command:

FusorSV=$SVE_home/scripts/FusorSV/FusorSV.py PYTHON=/mnt/cifs01/simone/software/miniconda2/bin/python SVE_home=/mnt/cifs01/simone/software/SVE reference_genome=/home/simone/home_disk/Whole_genome/Homo_sapiens_assembly38.fasta bam_file=$working_dir/start_sorted.bam

$PYTHON $FusorSV -f $SVE_home/scripts/FusorSV/data/models/default.pickle -L DEFAULT -r $reference_genome -i /home/simone/vcf_files/ -p 24 -o /home/simone/NA12878/FusorSV

the software starts running and produces output folders with proper structure, but VCF file in /home/simone/NA12878/FusorSV/vcf is empty apart from header.

What could be the problem here? I saw there is also a standalone FusorSV version (https://github.com/timothyjamesbecker/FusorSV) with slightly different dependencies, which could be a bit different from the version included here. Although it seems that version is a bit older, should I give it a try, or are there any attempts I could do before with more success probability? I am a bit confused about -i option: I followed all the instructions, but the only time I saw some variants called by the ensemble classifier was when I specified a VCF in particular. Thanks in advance.

- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/TheJacksonLaboratory/SVE/issues/1, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAZl6Bdtr_1iTo3CiM-wAJDRZdF3v2v2ks5tpZSigaJpZM4TXvrC.

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.

MaestSi commented 6 years ago

Ok, thanks for the information. Since in my laboratory we usually work with hg38 reference, I'll have to stick to that, accepting that I won't be able to get some of the callers working properly. In your opinion, would it be better to run SVE on hg19 and then to do a lift-over, or to run on hg38 (obviously accepting that some of the callers won't work)? If I chose to run fusorSV on hg38, should I use a different combination of parameters than what I did? It is not very clear to me if I should "liftover chain file path" or not with -L parameter and if I should make some other modifications, for example to tell fusorSV that I did not run all the callers.

This is the log I obtained:

no contig directory specified using default stage id exclude list:[1, 36] processing samples ['/mnt/cifs01/simone/vcf_files/start_sorted'] for chroms ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', 'X', 'Y', 'MT'] merging the svmask regions svmask regions merged in 0.0 sec reading, parsing, partitioning and writing sample VCFs reading sample start_sorted finished reading 0 out of 1 samples generating 0 partitions in 1.1 sec starting posterior estimate on partition: t=0 b=0 posterior estimate on partition: t=0 b=0 65.41 sec alpha=1.0 starting posterior estimate on partition: t=0 b=1 posterior estimate on partition: t=0 b=1 70.73 sec alpha=1.0 starting posterior estimate on partition: t=0 b=2 posterior estimate on partition: t=0 b=2 69.99 sec alpha=1.0 starting posterior estimate on partition: t=0 b=3 posterior estimate on partition: t=0 b=3 65.85 sec alpha=1.0 starting posterior estimate on partition: t=0 b=4 posterior estimate on partition: t=0 b=4 63.99 sec alpha=1.0 starting posterior estimate on partition: t=0 b=5 posterior estimate on partition: t=0 b=5 65.24 sec alpha=1.0 starting posterior estimate on partition: t=0 b=6 posterior estimate on partition: t=0 b=6 66.32 sec alpha=1.0 starting posterior estimate on partition: t=0 b=7 posterior estimate on partition: t=0 b=7 66.69 sec alpha=1.0 starting posterior estimate on partition: t=0 b=8 posterior estimate on partition: t=0 b=8 62.85 sec alpha=1.0 starting posterior estimate on partition: t=1 b=0 posterior estimate on partition: t=1 b=0 62.89 sec alpha=3.4341838662e-05 starting posterior estimate on partition: t=1 b=1 posterior estimate on partition: t=1 b=1 66.65 sec alpha=1.0 starting posterior estimate on partition: t=1 b=2 posterior estimate on partition: t=1 b=2 63.84 sec alpha=1.0 starting posterior estimate on partition: t=1 b=3 posterior estimate on partition: t=1 b=3 65.14 sec alpha=1.0 starting posterior estimate on partition: t=2 b=0 posterior estimate on partition: t=2 b=0 64.73 sec alpha=1.0 starting posterior estimate on partition: t=2 b=1 posterior estimate on partition: t=2 b=1 66.15 sec alpha=1.0 starting posterior estimate on partition: t=2 b=2 posterior estimate on partition: t=2 b=2 71.52 sec alpha=0.0293786326107 starting posterior estimate on partition: t=2 b=3 posterior estimate on partition: t=2 b=3 65.16 sec alpha=0.34937267559 starting posterior estimate on partition: t=2 b=4 posterior estimate on partition: t=2 b=4 67.46 sec alpha=0.52542302131 starting posterior estimate on partition: t=2 b=5 posterior estimate on partition: t=2 b=5 64.59 sec alpha=0.499862231954 starting posterior estimate on partition: t=2 b=6 posterior estimate on partition: t=2 b=6 71.09 sec alpha=0.383344526714 starting posterior estimate on partition: t=2 b=7 posterior estimate on partition: t=2 b=7 66.98 sec alpha=0.518215864037 starting posterior estimate on partition: t=2 b=8 posterior estimate on partition: t=2 b=8 61.74 sec alpha=0.502344730447 starting posterior estimate on partition: t=2 b=9 posterior estimate on partition: t=2 b=9 62.4 sec alpha=0.519560482193 starting posterior estimate on partition: t=2 b=10 posterior estimate on partition: t=2 b=10 60.99 sec alpha=0.613613159937 starting posterior estimate on partition: t=2 b=11 posterior estimate on partition: t=2 b=11 62.36 sec alpha=0.617913396395 starting posterior estimate on partition: t=2 b=12 posterior estimate on partition: t=2 b=12 62.0 sec alpha=0.556665831188 starting posterior estimate on partition: t=2 b=13 posterior estimate on partition: t=2 b=13 62.09 sec alpha=0.719264343745 starting posterior estimate on partition: t=2 b=14 posterior estimate on partition: t=2 b=14 62.33 sec alpha=0.733482953972 starting posterior estimate on partition: t=2 b=15 posterior estimate on partition: t=2 b=15 62.17 sec alpha=0.403955199071 starting posterior estimate on partition: t=2 b=16 posterior estimate on partition: t=2 b=16 61.88 sec alpha=0.248096526166 starting posterior estimate on partition: t=3 b=0 posterior estimate on partition: t=3 b=0 62.74 sec alpha=1.0 starting posterior estimate on partition: t=3 b=1 posterior estimate on partition: t=3 b=1 63.23 sec alpha=1.0 starting posterior estimate on partition: t=3 b=2 posterior estimate on partition: t=3 b=2 60.64 sec alpha=1.0 starting posterior estimate on partition: t=3 b=3 posterior estimate on partition: t=3 b=3 62.51 sec alpha=0.0655191344612 starting posterior estimate on partition: t=3 b=4 posterior estimate on partition: t=3 b=4 64.13 sec alpha=0.129861687173 starting posterior estimate on partition: t=3 b=5 posterior estimate on partition: t=3 b=5 61.56 sec alpha=0.128995129137 starting posterior estimate on partition: t=3 b=6 posterior estimate on partition: t=3 b=6 60.38 sec alpha=0.102543074863 starting posterior estimate on partition: t=3 b=7 posterior estimate on partition: t=3 b=7 63.64 sec alpha=0.0339046220974 starting posterior estimate on partition: t=4 b=0 posterior estimate on partition: t=4 b=0 63.03 sec alpha=1.0 starting posterior estimate on partition: t=4 b=1 posterior estimate on partition: t=4 b=1 61.65 sec alpha=1.0 starting posterior estimate on partition: t=4 b=2 posterior estimate on partition: t=4 b=2 62.11 sec alpha=1.0 starting posterior estimate on partition: t=4 b=3 posterior estimate on partition: t=4 b=3 60.78 sec alpha=1.0 starting posterior estimate on partition: t=4 b=4 posterior estimate on partition: t=4 b=4 62.96 sec alpha=1.0 starting posterior estimate on partition: t=4 b=5 posterior estimate on partition: t=4 b=5 62.15 sec alpha=1.0 starting posterior estimate on partition: t=4 b=6 posterior estimate on partition: t=4 b=6 60.35 sec alpha=1.0 starting posterior estimate on partition: t=4 b=7 posterior estimate on partition: t=4 b=7 60.99 sec alpha=1.0 starting posterior estimate on partition: t=4 b=8 posterior estimate on partition: t=4 b=8 61.91 sec alpha=1.0 starting posterior estimate on partition: t=5 b=0 posterior estimate on partition: t=5 b=0 61.31 sec alpha=1.0 starting posterior estimate on partition: t=5 b=1 posterior estimate on partition: t=5 b=1 62.47 sec alpha=1.0 starting posterior estimate on partition: t=5 b=2 posterior estimate on partition: t=5 b=2 62.38 sec alpha=1.0 starting posterior estimate on partition: t=5 b=3 posterior estimate on partition: t=5 b=3 62.54 sec alpha=1.0 starting posterior estimate on partition: t=5 b=4 posterior estimate on partition: t=5 b=4 64.86 sec alpha=0.155508169481 starting posterior estimate on partition: t=5 b=5 posterior estimate on partition: t=5 b=5 61.57 sec alpha=0.37861822495 starting posterior estimate on partition: t=5 b=6 posterior estimate on partition: t=5 b=6 62.09 sec alpha=1.0 starting posterior estimate on partition: t=5 b=7 posterior estimate on partition: t=5 b=7 65.56 sec alpha=0.117773965802 starting posterior estimate on partition: t=5 b=8 posterior estimate on partition: t=5 b=8 62.98 sec alpha=1.0 starting posterior estimate on partition: t=5 b=9 posterior estimate on partition: t=5 b=9 62.62 sec alpha=0.531572093233 starting posterior estimate on partition: t=5 b=10 posterior estimate on partition: t=5 b=10 63.66 sec alpha=0.999245007297 starting posterior estimate on partition: t=5 b=11 posterior estimate on partition: t=5 b=11 62.25 sec alpha=1.0 starting posterior estimate on partition: t=5 b=12 posterior estimate on partition: t=5 b=12 64.96 sec alpha=1.0 starting posterior estimate on partition: t=5 b=13 posterior estimate on partition: t=5 b=13 63.28 sec alpha=1.0 starting posterior estimate on partition: t=5 b=14 posterior estimate on partition: t=5 b=14 64.97 sec alpha=1.0 starting posterior estimate on partition: t=6 b=0 posterior estimate on partition: t=6 b=0 63.13 sec alpha=1.0 starting posterior estimate on partition: t=6 b=1 posterior estimate on partition: t=6 b=1 61.5 sec alpha=1.0 starting posterior estimate on partition: t=6 b=2 posterior estimate on partition: t=6 b=2 62.28 sec alpha=1.0 starting posterior estimate on partition: t=6 b=3 posterior estimate on partition: t=6 b=3 64.02 sec alpha=1.0 starting posterior estimate on partition: t=6 b=4 posterior estimate on partition: t=6 b=4 64.12 sec alpha=1.0 starting posterior estimate on partition: t=6 b=5 posterior estimate on partition: t=6 b=5 62.64 sec alpha=1.0 starting posterior estimate on partition: t=6 b=6 posterior estimate on partition: t=6 b=6 61.55 sec alpha=1.0 starting posterior estimate on partition: t=6 b=7 posterior estimate on partition: t=6 b=7 62.53 sec alpha=1.0 starting posterior estimate on partition: t=6 b=8 posterior estimate on partition: t=6 b=8 62.2 sec alpha=1.0 starting posterior estimate on partition: t=7 b=0 posterior estimate on partition: t=7 b=0 61.31 sec alpha=1.0 starting posterior estimate on partition: t=7 b=1 posterior estimate on partition: t=7 b=1 61.95 sec alpha=1.0 starting posterior estimate on partition: t=7 b=2 posterior estimate on partition: t=7 b=2 61.39 sec alpha=1.0 starting posterior estimate on partition: t=7 b=3 posterior estimate on partition: t=7 b=3 64.1 sec alpha=1.0 starting posterior estimate on partition: t=7 b=4 posterior estimate on partition: t=7 b=4 63.87 sec alpha=1.0 starting posterior estimate on partition: t=7 b=5 posterior estimate on partition: t=7 b=5 75.5 sec alpha=1.0 starting posterior estimate on partition: t=7 b=6 posterior estimate on partition: t=7 b=6 65.15 sec alpha=1.0 starting posterior estimate on partition: t=7 b=7 posterior estimate on partition: t=7 b=7 61.22 sec alpha=1.0 starting posterior estimate on partition: t=7 b=8 posterior estimate on partition: t=7 b=8 64.32 sec alpha=1.0 finished estimation in 6259.32 sec apply fusion model to sample inputs and generating fusorSV ouput starting fusorSV discovery on sample start_sorted loading base and posterior estimate partitions for start_sorted writing VCF for start_sorted scoring completed for start_sorted in 0.07 sec finished reading samples in 184.25 sec G1K-P3-------------------------------------------------------------- MetaSV-------------------------------------------------------------- BreakSeq-------------------------------------------------------------- Pindel-------------------------------------------------------------- Tigra-------------------------------------------------------------- cnMOPS-------------------------------------------------------------- CNVnator-------------------------------------------------------------- Delly-------------------------------------------------------------- GATK-------------------------------------------------------------- GenomeSTRiP-------------------------------------------------------------- Hydra-------------------------------------------------------------- Lumpy-------------------------------------------------------------- BreakDancer-------------------------------------------------------------- fusorSV-------------------------------------------------------------- run 0 in 6811.72 sec ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Although it seems it ran smoothly, output VCF file" start_sorted_S-1.vcf" is empty a part from the header. Thanks again.

lslochov commented 6 years ago

Hi, I'm currently working on figuring out the hg38 issues in fusorSV. Hopefully we'll have a new version out pretty soon.

MaestSi commented 6 years ago

I don't know if this information could be of help, but when I specified with parameter -c: 'chr1','chr2','chr3','chr4','chr5','chr6','chr7','chr8','chr9','chr10','chr11','chr12','chr13','chr14','chr15','chr16','chr17','chr18','chr19','chr20','chr21','chr22','chrX','chrY','chrMT' I got error: Traceback (most recent call last): File "/mnt/cifs01/simone/software/SVE/scripts/FusorSV/FusorSV.py", line 89, in chroms = args.chrom.split(',') AttributeError: 'Namespace' object has no attribute 'chrom'

Otherwise, if I don't specify -c argument, amongst other information, also this is printed to screen: processing samples ['/home/simone/home_disk/NA12878/SVE_output_hg38/vcf_files'] for chroms ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', 'X', 'Y', 'MT']

So, basically, if I am not wrong, it seems to me that FusorSV is always expecting chromosome names not to contain 'chr' in the name, whilst hg38 chromosome names do. What do you think about this?

StevenIg commented 6 years ago

It's a typo. Change chroms = args.chrom.split(',') to chroms = args.chroms.split(',') and it's work.

MaestSi commented 6 years ago

Thanks Stevenlg, that seems to be a typo, and probably should be corrected at least in the 'dev' branch (which contains a FusorSV version that is supposed to work also with hg38).

MaestSi commented 6 years ago

Sorry if I reopen this issue, but unfortunately, also after the typo correction, I am still not able to obtain output from FusorSV with hg38 reference (Homo_sapiens_assembly38.fasta from GATK bundle). Any help from anybody able to obtain a non-empty output VCF file with a similar reference would be much appreciated.

samanthaleejensen commented 5 years ago

I'm having the same problem with GRCh37 too (on a single sample), so maybe the problem is related to a single sample rather than FusorSV?

MaestSi commented 5 years ago

Are you using GRCh37 with alternative haplotypes? If yes, maybe, as @lslochov kindly suggested, my issue was given by the fact that CNVnator VCF file obtained with docker version of SVE ("samplename"_S10.vcf) had all the variants localized on HLA chromosomes which were malformed, with the chromosome name split over the first 2 columns and so with the 2nd column not containing the starting coordinate; so I solved it with something like: grep -v "^HLA" "samplename"_S10_original.vcf > "samplename"_S10.vcf. Did you create a folders structure as specified in the README, also if you have a single sample (so .../VCF_files/samplename/samplename_S*.vcf)? I also experienced issues in running FusorSV when I didn't put a '/' at the end of the directory specified with -i vcfFiles/. I know that '/' is specified in the README, but I had not realized that it was fundamental to put it.

lslochov commented 5 years ago

@samanthaleejensen The suggestions provided by @MaestSi represent our findings from a detailed process of troubleshooting FusorSV on @MaestSi's input VCFs. It was my intention to post those findings on this thread as the official solution, so many thanks to @MaestSi for his post. We are working on a new version of FusorSV that will be more helpful in informing the user when there are issues with the inputs that would lead to empty output.