Closed geocarvalho closed 6 years ago
Hello! Sorry about the Manta issue, I noticed there was a bug making it impossible to turn on the manta module. The Manta module is still undergoing development, but it shoudl atleast run now.
If you pull the latest commit and restart your sample you should get similar output:
[2c/1cf7d3] Submitted process > Manta (cancer_sample.bam) [e0/bec4a4] Submitted process > TIDDIT (cancer_sample.bam) [7e/b1854f] Submitted process > CNVnator (cancer_sample.bam) [8f/17e966] Submitted process > combine (cancer_sample.bam) [10/744fd6] Submitted process > annotate (cancer_sample.bam)
To debug the annotation problem, you can enter the FindSV directory, from there, you should have a "work" directory, which is generated from nextflow. like this:
/home/jesperei/FindSV-1_2017/FindSV /home/jesperei/FindSV-1_2017/FindSV/work
Within the work directory, you will find a directory for each job submited, to find the annotation job directory, you can use the hash from the output log: [10/744fd6] Submitted process > annotate (cancer_sample.bam) the 10/744fd6 is part of the directory name, if you cd 10/744fd6 and complete it using tab, you will find the directory: /home/jesperei/FindSV-1_2017/FindSV/work/10/744fd64de5c4fb6d7917455afb6111 within that directory you will find an error log file: .command.err and command output file: .command.out
The .command.err file should contain the error. Feel free to post it here! Good luck!
Hey! Thanks for the help. I searched for the "command.out" and "command.err" for my case, no files..
N E X T F L O W ~ version 0.23.3
Launching `FindSV_core.nf` [elegant_heisenberg] - revision: 0828787848
/media/"sample-name"/"sample-name".bam.bai
[warm up] executor > local
[b9/44c00d] Submitted process > TIDDIT ("sample-name".bam)
[f8/d97806] Submitted process > CNVnator ("sample-name".bam)
[a1/394b60] Submitted process > combine ("sample-name".bam)
[6b/22ac6b] Submitted process > annotate ("sample-name".bam)
WARN: Process `annotate ("sample-name".bam)` terminated with an error exit status (2)
-- Error is ignored
lab@6a8a718405b0:~/FindSV/work/6b/22ac6bdd64ddad4911768417fa46f7$ ls
"sample-name".bam "sample-name".bam.bai "sample-name"_CombinedCalls.vcf
Frist I reinstalled VEP using git instead conda and after I tried another run that I had the same problem but now I can run Manta:
(FindSV_env) lab@6a8a718405b0:~/FindSV$ python FindSV.py --bam /media/"sample-name"/"sample-name".bam --output /media/cnv-out/"sample-name"/work-7 --config ./FindSV.conf
DONE
Processing, please do not turn FindSV off
SAMPLE_ID:/media/"sample-name"/"sample-name".bam
N E X T F L O W ~ version 0.23.3
Launching `FindSV_core.nf` [evil_dubinsky] - revision: efecb129cf
/media/"sample-name"/"sample-name".bam.bai
[warm up] executor > local
[b9/6b6cc9] Submitted process > TIDDIT ("sample-name".bam)
[a9/900055] Submitted process > Manta ("sample-name".bam)
[e4/4d9031] Submitted process > CNVnator ("sample-name".bam)
ls
[88/e1f6b9] Submitted process > combine ("sample-name".bam)
[80/e4be23] Submitted process > annotate ("sample-name".bam)
WARN: Process `annotate ("sample-name".bam)` terminated with an error exit status (1) -- Error is ignored
"sample-name"
FAILED:ANNOTATION
DONE
(FindSV_env) lab@6a8a718405b0:~/FindSV$ ls work/80/e4be23c77fae458402ebaff0bab67f/"sample-name".bam "sample-name".bam.bai "sample-name"_CombinedCalls.vcf
And still without errors' log :/ PS.: The final vcf combined is awesome! Another question: Can I use your script for exome ? Cheers, George.
No problems, and happy to hear that Manta is working =P.
The error log is a bit tricky, it's a hidden file:
[jesperei@milou2 FindSV]$ ls work/dd/ab0691319e81436e2a3364a9a0361a/: P4855_102.clean.dedup.bai P4855_102.clean.dedup_CombinedCalls.vcf P4855_102.clean.dedup_CombinedCalls.vcf.tmp_warnings.txt P4855_102.clean.dedup.bam P4855_102.clean.dedup_CombinedCalls.vcf.tmp_summary.html P4855_102.clean.dedup_FindSV.vcf
however, if you type ls like this: ls work/dd/ab0691319e81436e2a3364a9a0361a/.* you will find it:
[jesperei@milou2 FindSV]$ ls work/dd/ab0691319e81436e2a3364a9a0361a/.* work/dd/ab0691319e81436e2a3364a9a0361a/.command.begin work/dd/ab0691319e81436e2a3364a9a0361a/.command.out work/dd/ab0691319e81436e2a3364a9a0361a/.command.sh work/dd/ab0691319e81436e2a3364a9a0361a/.command.err work/dd/ab0691319e81436e2a3364a9a0361a/.command.run work/dd/ab0691319e81436e2a3364a9a0361a/.command.trace
work/dd/ab0691319e81436e2a3364a9a0361a/.command.log work/dd/ab0691319e81436e2a3364a9a0361a/.command.run.1 work/dd/ab0691319e81436e2a3364a9a0361a/.exitcode
work/dd/ab0691319e81436e2a3364a9a0361a/.: P4855_102.clean.dedup.bai P4855_102.clean.dedup_CombinedCalls.vcf P4855_102.clean.dedup_CombinedCalls.vcf.tmp_warnings.txt P4855_102.clean.dedup.bam P4855_102.clean.dedup_CombinedCalls.vcf.tmp_summary.html P4855_102.clean.dedup_FindSV.vcf
or try to open it directly using nano: nano work/dd/ab0691319e81436e2a3364a9a0361a/.command.err
From your pasted ls output, it seems that it is vep casuing the troubles. Could it be the VEP version? I have been using version 84, according to the log version 87 should be mostly backwards compatible, but you never now.. Maybe it is time I change the command line(I have used the same command since release 81 more or less =P).
Also, FindSV issues this command to run VEP: ${VEP_exec_file} --cache --force_overwrite --poly b -i ${vcf_file} -o ${vcf_file}.tmp --buffer_size 5 --port 3337 --vcf --per_gene --format vcf -q
if you activate the FindSV_env and run it on your combined vcf file, you should get the same results as FindSV(use the path/variable of the vepvscript instead of ${VEP_exec_file}, and replace ${vcf_file} to your vcf file). I would not recommend running it on exome data. I have been running FindSV on some some panels and extracted bam regions, I found the variants i wanted to find, but I got ugly results, CNVnator will call deletions across every region not within the panel, and TIDDITs filters will act a bit crazy. If you try using it on exome data you would need to exclude regions manually, and skip using the filter flag of the vcf.
Happy to hear that you got the combined vcf file, and good luck with the analysis! Have a nice weekend!
Thanks a lot, I think it will work well now that I could see the error message. When the run finish I come back make my report about the annotation.
Again, thank you for the great work here. Have a nice carnival!
Hello again! I had this problem:
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LC_CTYPE = "pt_BR.UTF-8",
LANG = "pt_BR.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
-------------------- EXCEPTION --------------------
MSG: ERROR: Cannot index bgzipped FASTA file with Bio::DB::Fasta
STACK Bio::EnsEMBL::Variation::Utils::FastaSequence::setup_fasta /home/lab/ensembl-vep/Bio/EnsEMBL/Variation/Utils/FastaSequence.pm:210
STACK Bio::EnsEMBL::VEP::BaseVEP::fasta_db /home/lab/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseVEP.pm:288
STACK Bio::EnsEMBL::VEP::Runner::init /home/lab/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:83
STACK Bio::EnsEMBL::VEP::Runner::run /home/lab/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:111
STACK toplevel /home/lab/ensembl-vep/vep.pl:193
Date (localtime) = Fri Feb 24 20:44:41 2017
Ensembl API version = 87
---------------------------------------------------
I saw this conversation and I think that my reference genome is the problem, what genome are you using? My genome is from Broad (my samtools is 1.3)
Another question, what script has this code:
${VEP_exec_file} --cache --force_overwrite --poly b -i ${vcf_file} -o ${vcf_file}.tmp --buffer_size 5 --port 3337 --vcf --per_gene --format vcf -q
I would like to add my reference genome (--fasta) and some maf informations (like: --af, --af_1kg, --af_exac). You think that add this maf information is unnecessary or you add some information like that (I'm considering the "frequency_filter.py" script) ? Cause I want CNVs relevants.
Best, George.
That's likely a VEP issue. Looks like something I've got when the reference genome is unindexed and located somewhere where I don't have write permissions. Fixing either should solve the problem! Cheers!
Hello there! We have been using hg19 the standard ucsc reference, as well as the thousand genome GRCh37 reference (with and without decoy). I have never seen such error myself! I think Daniels comment sounds reasonable.
I made a quick fix of the VEP command( It was needed anyway to allow different species etc): ${VEP_exec_file} -i ${vcf_file} -o ${vcf_file}.tmp ${params.vep_args}
where ${params.vep_args} is all the optional arguments for vep: vep_args="--cache --force_overwrite --poly b --buffer_size 5 --port 3337 --vcf --per_gene --format vcf -q"
If you pull the newest version of findSV and paste that line on line 60 of your config file, you can add or remove any vep option as you like. Like this:
original: //-----VEP_------------- //path to the vep script VEP_path='variant_effect_predictor.pl'
updated: //-----VEP_------------- //path to the vep script VEP_path='variant_effect_predictor.pl' vep_args="--cache --force_overwrite --poly b --buffer_size 5 --port 3337 --vcf --per_gene --format vcf -q"
you can also rerun the setup.py script, but this will probably take more time than pasting the line yourself =P. Are you using hg38? I think you should remove the --port 3337 then.
I have never tried the --af options, I'm not sure it will work on structural variants, but it could be worth trying! We have made our own internal database using SVDB https://github.com/J35P312/SVDB (set the SVDB_path='""' to your exported vcf database file). You could also try using the thousand genomes SV vcf, but I have found it to be not as efficient as our own samples.
Good luck! //Jesper
Thanks for the help guys! My problem with vep was solved when I gunzip the genome "$HOME/.vep/homo_sapiens/86_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz", after that everything happened as expected (I don't know if was my perl ou samtools version). Nice update, now I can try new args. I tried args for population databases but didn't work for this variants. I will see SVDB and how it could be used here.
PS.: Is there any chance that your genome database be available in the future?
Thanks for the work! Best, George.
A db with frequencies for 1000 swedes is now available for download. See https://swefreq.nbis.se. You will need to register & apply for access - all clickable.. Cheers!
Hello, I would like to know with you could help with some tips to start debug this problem, here is my log file:
the command was:
python FindSV.py --bam /media/sample/"sample-name".bam --output /media/cnv-out/sample/work-2/ --config ./FindSV.conf
my FindSV.conf:
Another problem, I changed the RunManta variable to "TRUE" to use Manta, but it didn't work. How can I run Manta? Sorry if my problem is basic, but I can't see what is happening in the error message.
Best regards, George.