Open feihongloveworld opened 5 years ago
Hi @feihongloveworld, might it be due to the lack of the 'chr' prefix in some files? Did you use exactly the same reference genome for mapping and for running Parliament2? It might be that the reference genome you used for mapping had the 'chr' prefix, while the reference fasta had not. To verify this, you could try to see if names for chr1 match:
BAM=/path/to/bam
REFERENCE=/path/to/fasta/reference
samtools view $BAM | cut -f3 | head -n1
cat $REFERENCE | grep "^>" | head -n1
Simone
Hi, I'm getting the same error. See below. I have confirmed the exact same fasta reference was used both for making the BAM files and for running parliament2. This must be a bug:
Convert Delly inversion results to VCF format
Convert Delly duplication results to VCF format
Convert Delly insertion results to VCF format
Reading calls ...
Can't parse sequence for chromosome chr1.
Traceback (most recent call last):
File "/vcf2bedpe.py", line 448, in
Hi,
It looks like this is an issue with the vcf2bedpe.py
file (https://github.com/dnanexus/parliament2/blob/master/resources/vcf2bedpe.py). I haven't seen this issue, nor have I looked into it beyond this. I am no longer at DNAnexus and as a result will not be developing Parliament2 as much as before, but you are welcome to submit a PR and I will try to review it in a timely manner.
Thanks, Samantha
I will see if I can figure out the issue, and will submit a PR if I can.
Hi, I pinpointed the issue and it looks like a complicated bug. I have no idea how to fix it.
vcf2bedpe.py is called by parliament2.sh using lumpy.vcf as the input. This is my lumpy.vcf file.
chr5 46270554 1 N <DEL> . . SVTYPE=DEL;STRANDS=+-:4;SVLEN=-5180;END=46275734;CIPOS=-10,9;CIEND=-10,9;CIPOS95=-1,1;CIEND95=-1,1;IMPRECISE;SU=4;PE=1;SR=3 GT:SU:PE:SR ./.:4:1:3
chr7 113776121 1 N <DEL> . . SVTYPE=DEL;STRANDS=+-:5;SVLEN=-6031;END=113782152;CIPOS=-7,9;CIEND=-10,9;CIPOS95=-1,1;CIEND95=-1,1;IMPRECISE;SU=5;PE=2;SR=3 GT:SU:PE:SR ./.:5:2:3
chr16 34587098 2 N <DUP> . . SVTYPE=DUP;STRANDS=-+:5;SVLEN=11806484;END=46393582;CIPOS=-528,9;CIEND=-10,577;CIPOS95=-155,1;CIEND95=-1,186;IMPRECISE;SU=5;PE=5;SR=0 GT:SU:PE:SR ./.:5:5:0
chr16 34588133 1 N <DUP> . . SVTYPE=DUP;STRANDS=-+:19;SVLEN=11803076;END=46391209;CIPOS=-321,7;CIEND=-10,249;CIPOS95=-122,0;CIEND95=-1,18;IMPRECISE;SU=19;PE=19;SR=0 GT:SU:PE:SR ./.:19:19:0
chr16 34588353 3 N <DEL> . . SVTYPE=DEL;STRANDS=+-:19;SVLEN=-11806104;END=46394457;CIPOS=-10,295;CIEND=-438,9;CIPOS95=-2,82;CIEND95=-99,0;IMPRECISE;SU=19;PE=19;SR=0 GT:SU:PE:SR ./.:19:19:0
chr20 31060353 1 N <DUP> . . SVTYPE=DUP;STRANDS=-+:4;SVLEN=8945;END=31069298;CIPOS=-452,9;CIEND=-10,459;CIPOS95=-141,2;CIEND95=-2,141;IMPRECISE;SU=4;PE=4;SR=0 GT:SU:PE:SR ./.:4:4:0
The error is coming from vcf2bedpe.py from this for loop:
for line in vcf_file:
if in_header:
if line[0] == '#':
header.append(line)
if line[1] != '#':
sample_list = line.rstrip().split('\t')[9:]
continue
else:
# print header
bedpe_out.write('\t'.join(['#CHROM_A',
'START_A',
'END_A',
'CHROM_B',
'START_B',
'END_B',
'ID',
'QUAL',
'STRAND_A',
'STRAND_B',
'TYPE',
'FILTER',
'INFO',
'FORMAT'] +
sample_list
) + '\n')
in_header = False
vcf.add_header(header)
As you can see, my lumpy.vcf file does not have any header, i.e. no lines start with '#'. This is why the above code in vcf2bedpe.py ends up referencing sample_list before it was defined, which is the error.
I can't imagine I'm the only one experiencing this, since I'm running a standard analysis with standard samples.
Are you able to take a look?
Perhaps it is related to this issue that was dismissed as not needing to be addressed: https://github.com/dnanexus/parliament2/issues/25
Lumpy runs give these errors, which perhaps is preventing lumpy from making headers in the final output vcf:
samtools view: writing to standard output failed: Broken pipe
samtools view: error closing standard output: -1
samblaster: Version 0.1.24
samblaster: Inputting from stdin
samblaster: Outputting to stdout
samblaster: Opening lumpy.1.vcf.dBD5u3KfJisr/disc_pipe for write.
/usr/bin/samtools: /miniconda/lib/libcurl.so.4: no version information available (required by /usr/bin/samtools)
/usr/bin/samtools: /miniconda/lib/libcurl.so.4: no version information available (required by /usr/bin/samtools)
/usr/bin/samtools: /miniconda/lib/libcurl.so.4: no version information available (required by /usr/bin/samtools)
/usr/bin/samtools: /miniconda/lib/libcurl.so.4: no version information available (required by /usr/bin/samtools)
/usr/bin/samtools: /miniconda/lib/libcurl.so.4: no version information available (required by /usr/bin/samtools)
samblaster: Opening lumpy.1.vcf.dBD5u3KfJisr/spl_pipe for write.
samblaster: Loaded 3366 header sequence entries.
Warning: 2236 unmatched name groups
samblaster: Output 1982 discordant read pairs to lumpy.1.vcf.dBD5u3KfJisr/disc_pipe
samblaster: Output 630 split reads to lumpy.1.vcf.dBD5u3KfJisr/spl_pipe
samblaster: Marked 0 of 206953 (0.00%) read ids as duplicates using 2104k memory in 0.735S CPU seconds and 30S wall time.
Removed 431 outliers with isize >= 1116
/usr/bin/samtools: /miniconda/lib/libcurl.so.4: no version information available (required by /usr/bin/samtools)
/usr/bin/samtools: /miniconda/lib/libcurl.so.4: no version information available (required by /usr/bin/samtools)
chr1 1000000
chr1 2000000
Thanks very much.
Hi, I discovered that in addition to the above issue with vcf2bedpe.py, there is another issue:
--> Can't parse sequence for chromosome chr error is actually coming from cnvnator2VCF.pl from the parseChrom function, not from the vcf2bedpe.py script. Likely the cnvnator2VCF.pl script is not finding the "chr.fa" files are they are not being created. The cnvnator scripts are all giving this error: "cnvnator: /miniconda/lib/libcurl.so.4: no version information available (required by cnvnator)", which may or may not be related to the above error.
All these issues seem too complicated for someone who didn't create parliament2 to fix. My guess is these issues are probably due to parliament sourcing new versions of the external tools that are no longer compatible with the parliament2 docker.
Is there a plan for any of the developers to get parliament2 working again? Otherwise, I think parliament2 won't be usable by people anymore. Thanks again.
Hi @gevro, thank you for discovering these issues and also filing ticket #78. Regarding your question:
Is there a plan for any of the developers to get parliament2 working again? Otherwise, I think parliament2 won't be usable by people anymore.
As I am no longer at DNAnexus, I have a very limited amount of time with which I can work on Parliament2, even though I would really like to help out. I am tagging @yihhwang and @Damien-Black to let the DNAnexus science team know that active development is needed. On my end, development is currently on hold.
Samantha
Thanks. There are many other bugs I have found, almost too many to list. I'm trying to hack away at them one by one. CNVnator no longer works, and I got it to work by installing its new version. Looks like SVtyper isn't working either for delly, lumpy, and breakdancer, and there is another bug in conversion of breakdancer output to vcf where not all the variants are being put into the vcf. There is only so much I can do, so if the others who have contributed in the past have time, it would be great to get this up and running again. I'll keep working on it too.
hi sir : i got a error,just like below:
-----------------------------------
./9.bwa.log ./index.log Reading calls ... Can't parse sequence for chromosome 1. Can't parse sequence for chromosome 2. Can't parse sequence for chromosome 3. Can't parse sequence for chromosome 4. Can't parse sequence for chromosome 5. Can't parse sequence for chromosome 6. Can't parse sequence for chromosome 7. Can't parse sequence for chromosome 8. Can't parse sequence for chromosome 9. Can't parse sequence for chromosome 10. Can't parse sequence for chromosome 11. Can't parse sequence for chromosome 12. Traceback (most recent call last): File "/vcf2bedpe.py", line 448, in
sys.exit(main())
File "/vcf2bedpe.py", line 443, in main
vcfToBedpe(args.input, args.output)
File "/vcf2bedpe.py", line 284, in vcfToBedpe
Can't parse sequence for chromosome 13.
sample_list
UnboundLocalError: local variable 'sample_list' referenced before assignment
Can't parse sequence for chromosome 14.
Can't parse sequence for chromosome 15.
Can't parse sequence for chromosome 16.
Can't parse sequence for chromosome 17.
Can't parse sequence for chromosome 18.
Can't parse sequence for chromosome 19.
Can't parse sequence for chromosome 20.
Can't parse sequence for chromosome 21.
Can't parse sequence for chromosome 22.
Can't parse sequence for chromosome X.
Can't parse sequence for chromosome Y.
Running SVTyper
Running SVTyper on Breakdancer outputs
grep: svtype_breakdancer: Is a directory
----------------------------------------------------