Illumina / paragraph

Graph realignment tools for structural variants
Other
150 stars 28 forks source link

INS error #39

Closed asylvz closed 4 years ago

asylvz commented 4 years ago

Hi, I'm trying paragraph for genotyping with the following command:

python ~/benchmark/tools/paragraph/paragraph-tools-build/bin/multigrmpy.py -i ~/benchmark/all_sv_grc37.vcf -m samples.txt -r ~/dataset/human_g1k_v37_gatk.fasta -o 50x

but I receive the following error:

Traceback (most recent call last): File "/home/asoylev/benchmark/tools/paragraph/paragraph-tools-build/bin/multigrmpy.py", line 34, in from grm.vcf2paragraph import convert_vcf_to_json File "/mnt/compgen/homes/asoylev/benchmark/tools/paragraph/paragraph-tools-build/lib/python3/grm/vcf2paragraph/init.py", line 32, in from grm.vcfgraph import VCFGraph, NoVCFRecordsException File "/mnt/compgen/homes/asoylev/benchmark/tools/paragraph/paragraph-tools-build/lib/python3/grm/vcfgraph/init.py", line 21, in from grm.vcfgraph.vcfgraph import VCFGraph, NoVCFRecordsException File "/mnt/compgen/homes/asoylev/benchmark/tools/paragraph/paragraph-tools-build/lib/python3/grm/vcfgraph/vcfgraph.py", line 178 f"Missing key {ins_info_key} for at {self.chrom}:{vcf.pos}; ")

Below is an INS line in the input VCF:

1 10028610 nssv14474350 A . . DBVARID;SVTYPE=INS;END=10028610;SVLEN=140;EXPERIMENT=1;SAMPLE=HG00733;REGIO NID=nsv3326290;SEQ=aggtcaggagtttgagaccagcctggccaacgtggtgaaaccccgactctactaaaaaaaaaagaacaaaaattaggcctggcgcggtggctcacgcctgtaatcccagcactttgggaggccgaggcgggcagat cacG;Eichler

any idea?

Thanks, Arda

traxexx commented 4 years ago

Hi @asylvz Looks like INS:ME:ALU is not recognized as \<INS> here. Could you please check if INS and SEQ keys were included in your vcf header? If so, could you please try to update INS:ME:ALU to \<INS> and see if that will work?

asylvz commented 4 years ago

I figured out that it was caused by python version. Now it works but I get invalid syntax error for my SVs, which seem to be correct. Sending two of them. Any idea why?

1 100148582 nssv14430311 A . . DBVARID;SVTYPE=DEL;END=100148644;SVLEN=-63;DESC=Sequences%20at%20least%2070%25%20masked%20by%20tandem%20repeat%20finder%20or%20contained%20within%20a%20tandem%20repeat;EXPERIMENT=5;SAMPLE=HG00514;REGIONID=nsv3281860;SEQ=agaaagaaagaaagaaagaaagaaagaaagagagagagagagagagagagagagagagagag ^ SyntaxError: invalid syntax

1 100004524 gnomAD_v2_DEL_1_7484 N 205 PASS END=100004691;SVTYPE=DEL;SVLEN=167;SEQ=AAAAATCAGTCCTCTTCTTAATTCTACCATCTTTTCCTATAGTGCATTTCAGACTCCTTGCTATTCATTTTTTTTGACTACCAAAAGATAAATAAATGGACCAGGTTCAGTGGCTCACACCTGTAATCCCAGCACTTTGAGAGGCCAAGATGAGGCGATCACCTGAG;gnomAD ^ SyntaxError: invalid syntax

traxexx commented 4 years ago

hmmm i cannot reproduce this error. I can genotype your records without errors on my end. Which platform are you using? Can it be a newline error between windows and mac/linux?

asylvz commented 4 years ago

I was using docker version and it was solved in the Ubuntu version. Probably it was my mistake. Thanks. But this time I use the following command:

python3 ~/apps/paragraph/bin/multigrmpy.py -i ~/benchmark/all_sv_grc37_sorted_2_no_insme_allseq.vcf -m samples.txt -r /mnt/compgen/inhouse/share/rg_annot/b37/human_g1k_v37.fasta -o 10x

and this manifest: id path depth read length sim10x ~/benchmark/dataset/simulation/10x/simu10x.bam 10 100

I get the following error although I have the file in place:

Failed to parse the options: Sample sim10x: File not found: ~/benchmark/dataset/simulation/10x/simu10x.bam 2020-03-16 16:02:37,082 ERROR Traceback (most recent call last): 2020-03-16 16:02:37,083 ERROR File "/home/asoylev/apps/paragraph/bin/multigrmpy.py", line 315, in run subprocess.check_call(commandline, shell=True, stderr=subprocess.STDOUT) 2020-03-16 16:02:37,083 ERROR File "/usr/lib/python3.6/subprocess.py", line 311, in check_call raise CalledProcessError(retcode, cmd) 2020-03-16 16:02:37,083 ERROR subprocess.CalledProcessError: Command '/mnt/compgen/homes/asoylev/apps/paragraph/bin/grmpy --response-file=/tmp/tmpt3m3jnb3.txt' returned non-zero exit status 1. Traceback (most recent call last): File "/home/asoylev/apps/paragraph/bin/multigrmpy.py", line 353, in main() File "/home/asoylev/apps/paragraph/bin/multigrmpy.py", line 349, in main run(args) File "/home/asoylev/apps/paragraph/bin/multigrmpy.py", line 315, in run subprocess.check_call(commandline, shell=True, stderr=subprocess.STDOUT) File "/usr/lib/python3.6/subprocess.py", line 311, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '/mnt/compgen/homes/asoylev/apps/paragraph/bin/grmpy --response-file=/tmp/tmpt3m3jnb3.txt' returned non-zero exit status 1.

traxexx commented 4 years ago

That's the error: ~/benchmark/dataset/simulation/10x/simu10x.bam not found. I suggest checking if your bash profile is loaded. If that's difficult, try to replace ~ with actual path.

asylvz commented 4 years ago

That solved the issue. Thank you