Gig77 / CooVar

Co-occurring variant analyzer
6 stars 4 forks source link

How to build .index for Coovar? #2

Open moldach opened 4 years ago

moldach commented 4 years ago

I'm getting errors when trying to run Coovar. I'm not sure whether you should feed in a .fa file or fa.gz file?

I've tried both but get the following errors:

Unzipped

(snakemake) [foo]$ perl /home/projects/common/tools/CooVar/coovar.pl -e /home/projects/common/indexes/WS265_wormbase/c_elegans.PRJNA13758.WS265.annotations.gff3 -r /home/projects/common/indexes/WS265_wormbase/c_elegans.PRJNA13758.WS265.genomic.fa -t VARIANT_CALLING/varscan/BC1217_sorted_dedupped_snp_varscan.tsv -o ANNOTATION/coovar/BC1217 --circos
[coovar.pl] Start executing script on Thu Jun  4 09:05:30 2020
[coovar.pl] Operating system: linux x86_64-linux-thread-multi
[coovar.pl] Program directory: /home/projects/common/tools/CooVar
[coovar.pl] Program version: 0.07
[coovar.pl] Command line: /home/projects/common/tools/CooVar/coovar.pl -e /home/projects/common/indexes/WS265_wormbase/c_elegans.PRJNA13758.WS265.annotations.gff3 -r /home/projects/common/indexes/WS265_wormbase/c_elegans.PRJNA13758.WS265.genomic.fa -t VARIANT_CALLING/varscan/BC1217_sorted_dedupped_snp_varscan.tsv -o ANNOTATION/coovar/BC1217 --circos
[coovar.pl]   REFERENCE: /home/projects/common/indexes/WS265_wormbase/c_elegans.PRJNA13758.WS265.genomic.fa
[coovar.pl]   CODING EXONS: /home/projects/common/indexes/WS265_wormbase/c_elegans.PRJNA13758.WS265.annotations.gff3
[coovar.pl]   GVS_TAB_FORMAT: VARIANT_CALLING/varscan/BC1217_sorted_dedupped_snp_varscan.tsv
[coovar.pl]   GVS_VCF_FORMAT:
[coovar.pl]   OUTPUT DIRECTORY: ANNOTATION/coovar/BC1217
[coovar.pl]   CIRCOS FLAG: 1
[coovar.pl]   FEATURE SOURCE:
[coovar.pl]   FEATURE TYPE: CDS
[coovar.pl] Indexing FASTA file /home/projects/common/indexes/WS265_wormbase/c_elegans.PRJNA13758.WS265.genomic.fa on Thu                                                   Jun  4 09:05:30 2020

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Could not open index file /home/projects/common/indexes/WS265_wormbase/c_elegans.PRJNA13758.WS265.genomic.fa.index:  No such file or directory
STACK: Error::throw
STACK: Bio::Root::Root::throw /home/perl5/lib/perl5/Bio/Root/Root.pm:449
STACK: Bio::DB::IndexedBase::_open_index /home/perl5/lib/perl5/Bio/DB/IndexedBase.pm:678
STACK: Bio::DB::IndexedBase::_index_files /home/perl5/lib/perl5/Bio/DB/IndexedBase.pm:655
STACK: Bio::DB::IndexedBase::index_file /home/perl5/lib/perl5/Bio/DB/IndexedBase.pm:488
STACK: Bio::DB::IndexedBase::new /home/perl5/lib/perl5/Bio/DB/IndexedBase.pm:365
STACK: /home/projects/common/tools/CooVar/coovar.pl:97
-----------------------------------------------------------

How does one create the .index file? I've tried to index with both bwa and samtools faidx without success

Now I try the gzip version:

(snakemake) []$ perl /home/projects/def-mtarailo/common/tools/CooVar/coovar.pl -e /home/projects/common/indexes/WS265_wormbase/c_elegans.PRJNA13758.WS265.annotations.gff3 -r /home/projects/common/indexes/WS265_wormbase/c_elegans.PRJNA13758.WS265.genomic.fa.gz -t VARIANT_CALLING/varscan/BC1217_sorted_dedupped_snp_varscan.tsv -o ANNOTATION/coovar/BC1217 --circos
[coovar.pl] Start executing script on Thu Jun  4 09:15:16 2020
[coovar.pl] Operating system: linux x86_64-linux-thread-multi
[coovar.pl] Program directory: /home/projects/common/tools/CooVar
[coovar.pl] Program version: 0.07
[coovar.pl] Command line: /home/projects/common/tools/CooVar/coovar.pl -e /home/projects/common/indexes/WS265_wormbase/c_elegans.PRJNA13758.WS265.annotations.gff3 -r /home/projects/common/indexes/WS265_wormbase/c_elegans.PRJNA13758.WS265.genomic.fa.gz -t VARIANT_CALLING/varscan/BC1217_sorted_dedupped_snp_varscan.tsv -o ANNOTATION/coovar/BC1217 --circos
[coovar.pl]   REFERENCE: /home/projects/common/indexes/WS265_wormbase/c_elegans.PRJNA13758.WS265.genomic.fa.gz
[coovar.pl]   CODING EXONS: /home/projects/common/indexes/WS265_wormbase/c_elegans.PRJNA13758.WS265.annotations.gff3
[coovar.pl]   GVS_TAB_FORMAT: VARIANT_CALLING/varscan/BC1217_sorted_dedupped_snp_varscan.tsv
[coovar.pl]   GVS_VCF_FORMAT:
[coovar.pl]   OUTPUT DIRECTORY: ANNOTATION/coovar/BC1217
[coovar.pl]   CIRCOS FLAG: 1
[coovar.pl]   FEATURE SOURCE:
[coovar.pl]   FEATURE TYPE: CDS
[coovar.pl] Indexing FASTA file /home/projects/common/indexes/WS265_wormbase/c_elegans.PRJNA13758.WS265.genomic.fa.gz on Thu Jun  4 09:15:16 2020

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Each line of the fasta entry must be the same length except the last. Line above #3 '!vB▒▒▒▒L▒̕▒;▒^▒s▒_V..' is 321 != 993 chars.
STACK: Error::throw
STACK: Bio::Root::Root::throw /home/perl5/lib/perl5/Bio/Root/Root.pm:449
STACK: Bio::DB::Fasta::_calculate_offsets /home/perl5/lib/perl5/Bio/DB/Fasta.pm:209
STACK: Bio::DB::IndexedBase::_index_files /home/perl5/lib/perl5/Bio/DB/IndexedBase.pm:660
STACK: Bio::DB::IndexedBase::index_file /home/perl5/lib/perl5/Bio/DB/IndexedBase.pm:488
STACK: Bio::DB::IndexedBase::new /home/perl5/lib/perl5/Bio/DB/IndexedBase.pm:365
STACK: /home/projects/common/tools/CooVar/coovar.pl:97
-----------------------------------------------------------
moldach commented 4 years ago

It's possible another user on our system corrupted the index already in place so I download the references and built indexes again:

mkdir REF
cd REF
wget ftp://ftp.wormbase.org/pub/wormbase/releases/WS265/species/c_elegans/PRJNA13758/c_elegans.PRJNA13758.WS265.genomic.fa.gz
wget ftp://ftp.wormbase.org/pub/wormbase/releases/WS265/species/c_elegans/PRJNA13758/c_elegans.PRJNA13758.WS265.annotations.gff3.gz
gzip -d c_elegans.PRJNA13758.WS265.annotations.gff3.gz
gzip -d c_elegans.PRJNA13758.WS265.genomic.fa.gz
bwa index -a bwtsw c_elegans.PRJNA13758.WS265.genomic.fa
chmod 777 ~/REF

This is what REF directory with indexes looks like:

-rwxr-x--- 1 moldach moldach 101957874 Jun  4 11:37 c_elegans.PRJNA13758.WS265.genomic.fa
-rwxr-x--- 1 moldach moldach        14 Jun  4 12:42 c_elegans.PRJNA13758.WS265.genomic.fa.amb
-rwxr-x--- 1 moldach moldach       231 Jun  4 12:42 c_elegans.PRJNA13758.WS265.genomic.fa.ann
-rwxr-x--- 1 moldach moldach 100286508 Jun  4 12:42 c_elegans.PRJNA13758.WS265.genomic.fa.bwt
-rwxr-x--- 1 moldach moldach       181 Jun  4 11:53 c_elegans.PRJNA13758.WS265.genomic.fa.fai
-rwxr-x--- 1 moldach moldach  30314865 Jun  4 12:03 c_elegans.PRJNA13758.WS265.genomic.fa.gz
-rwxr-x--- 1 moldach moldach        14 Jun  4 12:06 c_elegans.PRJNA13758.WS265.genomic.fa.gz.amb
-rwxr-x--- 1 moldach moldach       231 Jun  4 12:06 c_elegans.PRJNA13758.WS265.genomic.fa.gz.ann
-rwxr-x--- 1 moldach moldach 100286508 Jun  4 12:06 c_elegans.PRJNA13758.WS265.genomic.fa.gz.bwt
-rwxr-x--- 1 moldach moldach  25071602 Jun  4 12:06 c_elegans.PRJNA13758.WS265.genomic.fa.gz.pac
-rwxr-x--- 1 moldach moldach  50143256 Jun  4 12:07 c_elegans.PRJNA13758.WS265.genomic.fa.gz.sa
-rwxr-x--- 1 moldach moldach         0 Jun  4 12:59 c_elegans.PRJNA13758.WS265.genomic.fa.index.dir
-rwxr-x--- 1 moldach moldach      1024 Jun  4 13:06 c_elegans.PRJNA13758.WS265.genomic.fa.index.pag
-rwxr-x--- 1 moldach moldach  25071602 Jun  4 12:42 c_elegans.PRJNA13758.WS265.genomic.fa.pac
-rwxr-x--- 1 moldach moldach  50143256 Jun  4 12:43 c_elegans.PRJNA13758.WS265.genomic.fa.sa

Now I try to run CooVar:

perl coovar.pl -e ~/REF/c_elegans.PRJNA13758.WS265.annotations.gff3 -r ~/REF/c_elegans.PRJNA13758.WS265.genomic.fa -t BC1217_sorted_dedupped_snp_varscan.tsv -o TEST --circos
[coovar.pl] Start executing script on Thu Jun  4 13:11:36 2020
[coovar.pl] Operating system: linux x86_64-linux-thread-multi
[coovar.pl] Program directory: /common/tools/CooVar
[coovar.pl] Program version: 0.07
[coovar.pl] Command line: coovar.pl -e /REF/c_elegans.PRJNA13758.WS265.annotations.gff3 -r /REF/c_elegans.PRJNA13758.WS265.genomic.fa -t BC1217_sorted_dedupped_snp_varscan.tsv -o TEST --circos
[coovar.pl]   REFERENCE: /REF/c_elegans.PRJNA13758.WS265.genomic.fa
[coovar.pl]   CODING EXONS: /REF/c_elegans.PRJNA13758.WS265.annotations.gff3
[coovar.pl]   GVS_TAB_FORMAT: BC1217_sorted_dedupped_snp_varscan.tsv
[coovar.pl]   GVS_VCF_FORMAT:
[coovar.pl]   OUTPUT DIRECTORY: TEST
[coovar.pl]   CIRCOS FLAG: 1
[coovar.pl]   FEATURE SOURCE:
[coovar.pl]   FEATURE TYPE: CDS
[coovar.pl] Indexing FASTA file /REF/c_elegans.PRJNA13758.WS265.genomic.fa on Thu Jun  4 13:11:36 2020
[coovar.pl]   ERROR: Could not index FASTA file. Do you have write permissions to the directory containing the FASTA file?
christiansbg commented 4 years ago

It looks for a file c_elegans.PRJNA13758.WS265.genomic.fa.index. This file should be created automatically by BioPerl if it does not exist. The error you get suggests there are still issues with write permissions in the target directory, although I don't fully understand why from everything you have posted here.