dnanexus-archive / parliament2

Runs a combination of tools to generate structural variant calls on whole-genome sequencing data
Apache License 2.0
102 stars 39 forks source link

Can't parse sequence for chromosome #69

Open feihongloveworld opened 5 years ago

feihongloveworld commented 5 years ago

hi sir : i got a error,just like below:

-----------------------------------

./9.bwa.log ./index.log Reading calls ... Can't parse sequence for chromosome 1. Can't parse sequence for chromosome 2. Can't parse sequence for chromosome 3. Can't parse sequence for chromosome 4. Can't parse sequence for chromosome 5. Can't parse sequence for chromosome 6. Can't parse sequence for chromosome 7. Can't parse sequence for chromosome 8. Can't parse sequence for chromosome 9. Can't parse sequence for chromosome 10. Can't parse sequence for chromosome 11. Can't parse sequence for chromosome 12. Traceback (most recent call last): File "/vcf2bedpe.py", line 448, in sys.exit(main()) File "/vcf2bedpe.py", line 443, in main vcfToBedpe(args.input, args.output) File "/vcf2bedpe.py", line 284, in vcfToBedpe Can't parse sequence for chromosome 13. sample_list UnboundLocalError: local variable 'sample_list' referenced before assignment Can't parse sequence for chromosome 14. Can't parse sequence for chromosome 15. Can't parse sequence for chromosome 16. Can't parse sequence for chromosome 17. Can't parse sequence for chromosome 18. Can't parse sequence for chromosome 19. Can't parse sequence for chromosome 20. Can't parse sequence for chromosome 21. Can't parse sequence for chromosome 22. Can't parse sequence for chromosome X. Can't parse sequence for chromosome Y. Running SVTyper Running SVTyper on Breakdancer outputs grep: svtype_breakdancer: Is a directory

----------------------------------------------------

MaestSi commented 5 years ago

Hi @feihongloveworld, might it be due to the lack of the 'chr' prefix in some files? Did you use exactly the same reference genome for mapping and for running Parliament2? It might be that the reference genome you used for mapping had the 'chr' prefix, while the reference fasta had not. To verify this, you could try to see if names for chr1 match:

BAM=/path/to/bam
REFERENCE=/path/to/fasta/reference
samtools view $BAM | cut -f3 | head -n1
cat $REFERENCE | grep "^>" | head -n1

Simone

gevro commented 5 years ago

Hi, I'm getting the same error. See below. I have confirmed the exact same fasta reference was used both for making the BAM files and for running parliament2. This must be a bug:

Convert Delly inversion results to VCF format Convert Delly duplication results to VCF format Convert Delly insertion results to VCF format Reading calls ... Can't parse sequence for chromosome chr1. Traceback (most recent call last): File "/vcf2bedpe.py", line 448, in sys.exit(main()) File "/vcf2bedpe.py", line 443, in main vcfToBedpe(args.input, args.output) File "/vcf2bedpe.py", line 284, in vcfToBedpe sample_list UnboundLocalError: local variable 'sample_list' referenced before assignment Can't parse sequence for chromosome chr2. Can't parse sequence for chromosome chr3. Can't parse sequence for chromosome chr4. Can't parse sequence for chromosome chr5. Can't parse sequence for chromosome chr6. Can't parse sequence for chromosome chr7. Can't parse sequence for chromosome chr8. Can't parse sequence for chromosome chr9. Can't parse sequence for chromosome chr10. Can't parse sequence for chromosome chr11. Can't parse sequence for chromosome chr12. Can't parse sequence for chromosome chr13. Can't parse sequence for chromosome chr14. Can't parse sequence for chromosome chr15. Can't parse sequence for chromosome chr16. Can't parse sequence for chromosome chr17. Can't parse sequence for chromosome chr18. Can't parse sequence for chromosome chr19. Can't parse sequence for chromosome chr20. Can't parse sequence for chromosome chr21. Can't parse sequence for chromosome chr22. Can't parse sequence for chromosome chrX. Can't parse sequence for chromosome chrY. Running SVTyper Running SVTyper on Breakdancer outputs grep: svtype_breakdancer: Is a directory Running SVTyper on CNVnator outputs grep: svtype_cnvnator: Is a directory Running SVTyper on Delly outputs grep: svtype_delly_0: Is a directory grep: svtype_delly_1: Is a directory grep: svtype_delly_2: Is a directory grep: svtype_delly_3: Is a directory

slzarate commented 5 years ago

Hi,

It looks like this is an issue with the vcf2bedpe.py file (https://github.com/dnanexus/parliament2/blob/master/resources/vcf2bedpe.py). I haven't seen this issue, nor have I looked into it beyond this. I am no longer at DNAnexus and as a result will not be developing Parliament2 as much as before, but you are welcome to submit a PR and I will try to review it in a timely manner.

Thanks, Samantha

gevro commented 5 years ago

I will see if I can figure out the issue, and will submit a PR if I can.

gevro commented 5 years ago

Hi, I pinpointed the issue and it looks like a complicated bug. I have no idea how to fix it.

vcf2bedpe.py is called by parliament2.sh using lumpy.vcf as the input. This is my lumpy.vcf file.

chr5    46270554    1   N   <DEL>   .   .   SVTYPE=DEL;STRANDS=+-:4;SVLEN=-5180;END=46275734;CIPOS=-10,9;CIEND=-10,9;CIPOS95=-1,1;CIEND95=-1,1;IMPRECISE;SU=4;PE=1;SR=3 GT:SU:PE:SR ./.:4:1:3
chr7    113776121   1   N   <DEL>   .   .   SVTYPE=DEL;STRANDS=+-:5;SVLEN=-6031;END=113782152;CIPOS=-7,9;CIEND=-10,9;CIPOS95=-1,1;CIEND95=-1,1;IMPRECISE;SU=5;PE=2;SR=3 GT:SU:PE:SR ./.:5:2:3
chr16   34587098    2   N   <DUP>   .   .   SVTYPE=DUP;STRANDS=-+:5;SVLEN=11806484;END=46393582;CIPOS=-528,9;CIEND=-10,577;CIPOS95=-155,1;CIEND95=-1,186;IMPRECISE;SU=5;PE=5;SR=0   GT:SU:PE:SR ./.:5:5:0
chr16   34588133    1   N   <DUP>   .   .   SVTYPE=DUP;STRANDS=-+:19;SVLEN=11803076;END=46391209;CIPOS=-321,7;CIEND=-10,249;CIPOS95=-122,0;CIEND95=-1,18;IMPRECISE;SU=19;PE=19;SR=0 GT:SU:PE:SR ./.:19:19:0
chr16   34588353    3   N   <DEL>   .   .   SVTYPE=DEL;STRANDS=+-:19;SVLEN=-11806104;END=46394457;CIPOS=-10,295;CIEND=-438,9;CIPOS95=-2,82;CIEND95=-99,0;IMPRECISE;SU=19;PE=19;SR=0 GT:SU:PE:SR ./.:19:19:0
chr20   31060353    1   N   <DUP>   .   .   SVTYPE=DUP;STRANDS=-+:4;SVLEN=8945;END=31069298;CIPOS=-452,9;CIEND=-10,459;CIPOS95=-141,2;CIEND95=-2,141;IMPRECISE;SU=4;PE=4;SR=0   GT:SU:PE:SR ./.:4:4:0

The error is coming from vcf2bedpe.py from this for loop:

for line in vcf_file:
    if in_header:
        if line[0] == '#':
            header.append(line)
            if line[1] != '#':
                sample_list = line.rstrip().split('\t')[9:]
            continue
        else:
            # print header
            bedpe_out.write('\t'.join(['#CHROM_A',
                                       'START_A',
                                       'END_A',
                                       'CHROM_B',
                                       'START_B',
                                       'END_B',
                                       'ID',
                                       'QUAL',
                                       'STRAND_A',
                                       'STRAND_B',
                                       'TYPE',
                                       'FILTER',
                                       'INFO',
                                       'FORMAT'] +
                                       sample_list
                                      ) + '\n')
            in_header = False
            vcf.add_header(header)

As you can see, my lumpy.vcf file does not have any header, i.e. no lines start with '#'. This is why the above code in vcf2bedpe.py ends up referencing sample_list before it was defined, which is the error.

I can't imagine I'm the only one experiencing this, since I'm running a standard analysis with standard samples.

Are you able to take a look?

Perhaps it is related to this issue that was dismissed as not needing to be addressed: https://github.com/dnanexus/parliament2/issues/25

Lumpy runs give these errors, which perhaps is preventing lumpy from making headers in the final output vcf:

samtools view: writing to standard output failed: Broken pipe
samtools view: error closing standard output: -1
samblaster: Version 0.1.24
samblaster: Inputting from stdin
samblaster: Outputting to stdout
samblaster: Opening lumpy.1.vcf.dBD5u3KfJisr/disc_pipe for write.
/usr/bin/samtools: /miniconda/lib/libcurl.so.4: no version information available (required by /usr/bin/samtools)
/usr/bin/samtools: /miniconda/lib/libcurl.so.4: no version information available (required by /usr/bin/samtools)
/usr/bin/samtools: /miniconda/lib/libcurl.so.4: no version information available (required by /usr/bin/samtools)
/usr/bin/samtools: /miniconda/lib/libcurl.so.4: no version information available (required by /usr/bin/samtools)
/usr/bin/samtools: /miniconda/lib/libcurl.so.4: no version information available (required by /usr/bin/samtools)
samblaster: Opening lumpy.1.vcf.dBD5u3KfJisr/spl_pipe for write.
samblaster: Loaded 3366 header sequence entries.
Warning: 2236 unmatched name groups
samblaster: Output 1982 discordant read pairs to lumpy.1.vcf.dBD5u3KfJisr/disc_pipe
samblaster: Output 630 split reads to lumpy.1.vcf.dBD5u3KfJisr/spl_pipe
samblaster: Marked 0 of 206953 (0.00%) read ids as duplicates using 2104k memory in 0.735S CPU seconds and 30S wall time.
Removed 431 outliers with isize >= 1116
/usr/bin/samtools: /miniconda/lib/libcurl.so.4: no version information available (required by /usr/bin/samtools)
/usr/bin/samtools: /miniconda/lib/libcurl.so.4: no version information available (required by /usr/bin/samtools)
chr1    1000000
chr1    2000000

Thanks very much.

gevro commented 5 years ago

Hi, I discovered that in addition to the above issue with vcf2bedpe.py, there is another issue:

--> Can't parse sequence for chromosome chr error is actually coming from cnvnator2VCF.pl from the parseChrom function, not from the vcf2bedpe.py script. Likely the cnvnator2VCF.pl script is not finding the "chr.fa" files are they are not being created. The cnvnator scripts are all giving this error: "cnvnator: /miniconda/lib/libcurl.so.4: no version information available (required by cnvnator)", which may or may not be related to the above error.

All these issues seem too complicated for someone who didn't create parliament2 to fix. My guess is these issues are probably due to parliament sourcing new versions of the external tools that are no longer compatible with the parliament2 docker.

Is there a plan for any of the developers to get parliament2 working again? Otherwise, I think parliament2 won't be usable by people anymore. Thanks again.

slzarate commented 5 years ago

Hi @gevro, thank you for discovering these issues and also filing ticket #78. Regarding your question:

Is there a plan for any of the developers to get parliament2 working again? Otherwise, I think parliament2 won't be usable by people anymore.

As I am no longer at DNAnexus, I have a very limited amount of time with which I can work on Parliament2, even though I would really like to help out. I am tagging @yihhwang and @Damien-Black to let the DNAnexus science team know that active development is needed. On my end, development is currently on hold.

Samantha

gevro commented 5 years ago

Thanks. There are many other bugs I have found, almost too many to list. I'm trying to hack away at them one by one. CNVnator no longer works, and I got it to work by installing its new version. Looks like SVtyper isn't working either for delly, lumpy, and breakdancer, and there is another bug in conversion of breakdancer output to vcf where not all the variants are being put into the vcf. There is only so much I can do, so if the others who have contributed in the past have time, it would be great to get this up and running again. I'll keep working on it too.