Closed vivekruhela closed 6 years ago
Please add mode
word after each uninterruptible sleep
. The process goes into uninterruptible sleep mode
. Sorry for this.
Thanks.
Hey, anyone here. I hope my issue is acknowledged by expert team. Any suggestions.......haplosauraus
with or without json
option goes into uninterruptible sleep mode
after sometime. I have tried and confirmed this many time. What can be possible reasons for this. Thanks
Hello, we will be looking into the problem. Could you please give us some more information about your input data? What is the type of variants, how many variants and how many genotypes or individuals are stored in the input VCF file? Thank you
Hi @vivekruhela.
Your stackexchange query says your objective is to annotate variants with SIFT. You can do this with VEP, without calculating protein sequences. Do you have it installed? VEP takes a variant list (in VCF or other formats) as input and provides SIFT results alongside similar tools such as PolyPhen2, REVEL, CADD, etc. Have a look at the 'Pathogenicity predictions' section on web tool here: http://www.ensembl.org/Multi/Tools/VEP. You need to enable dbNSFP to see a fuller list.
The error message you are seeing from haplo suggests your input file contains long variants overlapping multiple exons. These should be skipped rather than cause the process to hang, so it is not clear what is going wrong. As Anja said, any further information you can provide on your input data would be helpful.
@at7 : Thanks for reply. In my commnd line, I have shown in my issue, it is one patient vcf file containing 456840 variants which include several variant types such as nsSNV, exonic, UTR etc.
@sarahhunt : In my stackexchange post, I was interested in protein sequence of each patient. So I got suggestion to use haplosauraus
with json
to get protein sequence. I have used ANNOVAR
to get functional significance score and I have used VEP to determine the effects of variants. I have not tried VEP to get scores. Let me know if I missed anything. Thanks.
If you want protein sequences and functional significance scores you can use VEP and add options from https://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#output:
For protein sequence: If the variant overlaps the coding region of a transcript Amino_acids and Codons (eg I/V Att/Gtt) are automatically added to your output
protein function prediction scores can be added with --sift b or --polyphen b where the option b returns prediction term and score as described in the documentation
you can also use plugins for adding additional scores
Using Haplosaurus: Haplosaurus takes phased genotypes from a VCF and constructs a pair of haplotype sequences for each overlapped transcript; these sequences are also translated into predicted protein haplotype sequences. Each variant haplotype sequence is aligned and compared to the reference, and an HGVS-like name is constructed representing its differences to the reference.
Can you please confirm that your input VCF file contains phased genotypes? I couldn't reproduce the error you get when running Haplosaurus. It could also be related to the operating system you are using. Could you please also give us details about the type and version of your operating system?
@at7 : I think yes. My vcf file is phased. I am posting some lines of my vcf file to let you confirm again:
chrM 150 . T C 1668.77 PASS ABHom=1;AC=2;AF=1;AN=2;DP=58;ExcessHet=3.0103;FS=0;GQ_MEAN=172;HRun=1;MLEAC=2;MLEAF=1;MQ=60;NCC=0;OND=0;QD=28.77;SOR=1.278;VariantType=SNP GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1/1:0,58:58:99:2:1:1697,172,0
chrM 195 . C T 2528.77 PASS ABHom=1;AC=2;AF=1;AN=2;DP=57;ExcessHet=3.0103;FS=0;GQ_MEAN=181;HRun=1;MLEAC=2;MLEAF=1;MQ=60;NCC=0;OND=0;QD=34.24;SOR=0.765;VariantType=SNP GT:AD:DP:GQ:MLPSAC:MLPSAF:PGT:PID:PL 1/1:0,56:56:99:2:1:1|1:195_C_T:2557,181,0
chrM 199 rs72619362 T C 2462.77 PASS ABHom=1;AC=2;AF=1;AN=2;DB;DP=58;ExcessHet=3.0103;FS=0;GQ_MEAN=175;HRun=1;MLEAC=2;MLEAF=1;MQ=60;NCC=0;OND=0;QD=30.63;SOR=0.877;VariantType=SNP GT:AD:DP:GQ:MLPSAC:MLPSAF:PGT:PID:PL 1/1:0,57:57:99:2:1:1|1:195_C_T:2491,175,0
chrM 302 . AC A 673.9 PASS AC=1;AF=0.5;AN=2;BaseQRankSum=-3.369;ClippingRankSum=0;DP=53;ExcessHet=3.0103;FS=37.781;GQ_MEAN=14;HRun=8;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;NCC=0;QD=18.21;ReadPosRankSum=2.49;SOR=2.547;VariantType=DELETION.NumRepetitions_8.EventLength_1.RepeatExpansion_C GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 0/1:7,30:37:14:1:0.5:711,0,14
chrM 410 . A T 2225.77 PASS ABHom=1;AC=2;AF=1;AN=2;DP=76;ExcessHet=3.0103;FS=0;GQ_MEAN=228;HRun=3;MLEAC=2;MLEAF=1;MQ=60;NCC=0;OND=0;QD=29.29;SOR=0.693;VariantType=SNP GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1/1:0,76:76:99:2:1:2254,228,0
chrM 491 . T C 698.77 PASS ABHom=1;AC=2;AF=1;AN=2;DP=26;ExcessHet=3.0103;FS=0;GQ_MEAN=77;HRun=0;MLEAC=2;MLEAF=1;MQ=60;NCC=0;OND=0;QD=26.88;SOR=2.67;VariantType=SNP GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1/1:0,26:26:77:2:1:727,77,0
chrM 2354 . C T 1336.77 PASS ABHom=1;AC=2;AF=1;AN=2;DP=53;ExcessHet=3.0103;FS=0;GQ_MEAN=153;HRun=1;MLEAC=2;MLEAF=1;MQ=59.91;NCC=0;OND=0;QD=25.71;SOR=1.358;VariantType=SNP GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1/1:0,52:52:99:2:1:1365,153,0
chrM 2485 . C T 1114.77 PASS ABHom=1;AC=2;AF=1;AN=2;DP=42;ExcessHet=3.0103;FS=0;GQ_MEAN=124;HRun=0;MLEAC=2;MLEAF=1;MQ=43.82;NCC=0;OND=0;QD=26.54;SOR=1.127;VariantType=SNP GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1/1:0,42:42:99:2:1:1143,124,0
chrM 5581 . C T 603.77 PASS ABHom=1;AC=2;AF=1;AN=2;DP=22;ExcessHet=3.0103;FS=0;GQ_MEAN=65;HRun=0;MLEAC=2;MLEAF=1;MQ=34.26;NCC=0;OND=0;QD=27.44;SOR=6.273;VariantType=SNP GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1/1:0,22:22:65:2:1:632,65,0
chrM 9378 . G A 185.8 PASS ABHom=1;AC=2;AF=1;AN=2;DP=7;ExcessHet=3.0103;FS=0;GQ_MEAN=21;HRun=0;MLEAC=2;MLEAF=1;MQ=42.61;NCC=0;OND=0;QD=26.54;SOR=0.941;VariantType=SNP GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1/1:0,7:7:21:2:1:214,21,0
chrM 10401 . C T 1832.77 PASS ABHom=1;AC=2;AF=1;AN=2;DP=63;ExcessHet=3.0103;FS=0;GQ_MEAN=190;HRun=0;MLEAC=2;MLEAF=1;MQ=57.98;NCC=0;OND=0;QD=29.09;SOR=1.573;VariantType=SNP GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1/1:0,63:63:99:2:1:1861,190,0
Currently I am working on Linux server with 40 threads and dual core, 96GB RAM and 1 TB internal and 4 TB external storage, Ubuntu operating system. Thanks.
Your input data is not phased. Phased data uses '|' as a separator and not '/'. For example: phased would be 1|1, unphased is 1/1. You can find more information here: https://samtools.github.io/hts-specs/VCFv4.1.pdf You could try running a phasing algorithm and then rerun Haplosaurus on your phased data.
@at7 : My bad. Sorry for this confusion. I thought '/' stands for phased. I'll definitely try phasing algorithm and haplosaurus again.Thanks.
No worries. I will close the ticket for now.
Hi,
I was curious in protein sequence from patient data. According to this question, I tried this tool to get protein sequence. But, here, I am facing a new problem. After calling this tool using the command as shown below :
./haplo -i /mnt/storage/MM_Data/SM_5_WES/Variant-Calling/SM_5.updated_dbsnp.vcf.gz -o /mnt/storage/MM_Data/SM_5_WES/Variant-Calling/SM_5.haplosauras.txt -offline --dir_cache /home/ensembl-vep/homo_sapiens/
./haplo
script runs very well in starting but after some time, this process goes intouninterruptible sleep
mode. I have confirmed this two times usinghtop
in my server. When this happen first time, I thought, system is (may be) hanged due to heavy operations (because many other operations are running in parallel like gatk etc.) and taking a lot of time. So I interrupted it (after 48 hrs in first attempt) and run it again (by stopping all other parallel processes and executed only this command) when I checked the results after 24 hrs, it was stucked at the same warnings and was inuninterruptible sleep
mode again. I don't know what cause this because any process switched intouninterruptible sleep
only when there is any problem in data I/O and other processes are working well in my server. Does this mean that operation is complete or any other problem in module.And during the operation, this tool gives many warning like:
WARNING: genomic coord 51239295-51239309 possibly maps across coding/non-coding boundary in ENST00000375992
Any suggestions.......