vigneshravi commented 7 years ago

I ran into this error

-----------------------------------------------------------------------
CAVA (Clinical Annotation of VAriants) v1.2.0 is now running.
Started:  2016-12-19 12:53:49.709035

Configuration file:  cava-v1.2.0/CAVA-master/config.txt
Input file (VCF):    sample.vcf
Output file (VCF):   sample.cava.vcf

Input file contains 1105851 records to annotate.

Annotating variants ... 0.3%Process SingleJob-1:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/opt/ngstools/annotations/cava/cava-v1.2.0/CAVA-master/cava.py", line 307, in run
    record.annotate(self.ensembl, self.dbsnp, self.reference, self.impactdir)
  File "/opt/ngstools/annotations/cava/cava-v1.2.0/CAVA-master/core.py", line 261, in annotate
    variant.annotate(ensembl, dbsnp, reference, impactdir)
  File "/opt/ngstools/annotations/cava/cava-v1.2.0/CAVA-master/core.py", line 86, in annotate
    if not ensembl is None: self = ensembl.annotate(self, reference, impactdir)
  File "/opt/ngstools/annotations/cava/cava-v1.2.0/CAVA-master/data.py", line 329, in annotate
    csn_plus, protchange_plus = csn.getAnnotation(variant_plus, transcript, reference, protein, mutprotein_plus)
  File "/opt/ngstools/annotations/cava/cava-v1.2.0/CAVA-master/csn.py", line 63, in getAnnotation
    dna, dna_ins = makeDNAannotation(variant, transcript, reference)
TypeError: 'NoneType' object is not iterable

(Size of output file: 21000.7 Kbyte)

CAVA (Clinical Annotation of VAriants) successfully finished.
Ended:  2016-12-19 12:54:26.628074
Total runtime: 0:00:36.919039
-----------------------------------------------------------------------

This is the code I used

python cava-v1.2.0/CAVA-master/cava.py -c cava-v1.2.0/CAVA-master/config.txt -i sample.vcf -o sample.cava

I reran the above command and the code throws an error at a different line. It doesnt seem to be because of one variant entry in a VCF file. Any help in fixing this error is much appreciated.

marton-munz commented 7 years ago

Dear Vignesh,

As you write, the error is probably not related to the particular variant you are reporting here. I have tried it and CAVA v1.2.0 returned no error for this input line, the variant was annotated successfully.

Could you please send your full input VCF file, configuration file as well as information on which reference genome and transcript database you were using to cava-user-group@googlegroups.com? I could then try to reproduce the error and find out what the real issue is.

Thanks! Best, Márton

vigneshravi commented 7 years ago

Hello Marton,

I am attaching a file with the command and log, FileForCAVA.vcf, config file. I use the human decoy genome hs37d5.fa and the transcript file ensembl75s.gz which comes with the cava package.

Thank you for looking into this! Vignesh Ravichandran

On Tue, Dec 20, 2016 at 10:26 AM, github-munz notifications@github.com wrote:

Dear Vignesh,

As you write, the error is probably not related to the particular variant you are reporting here. I have tried it and CAVA v1.2.0 returned no error for this input line, the variant was annotated successfully.

Could you please send your full input VCF file, configuration file as well as information on which reference genome and transcript database you were using to cava-user-group@googlegroups.com? I could then try to reproduce the error and find out what the real issue is.

Thanks! Best, Márton

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/RahmanTeam/CAVA/issues/2#issuecomment-268271352, or mute the thread https://github.com/notifications/unsubscribe-auth/AH25JhPSqztGnTJvHFipTXlXvxwpUelOks5rJ_OdgaJpZM4LRDK8 .

This is a template of the configuration file for CAVA v1.2.0

Input file format

Possible values: VCF or TXT | Optional: yes | Default value: VCF

@inputformat = VCF

Output file format

Possible values: VCF or TSV | Optional: yes | Default value: VCF

@outputformat = VCF

Absolute path to reference genome file

Possible values: string | Optional: no

@reference = /rawdata/data/REF/Human_Decoy_REF/hs37d5.fa

Absolute path to Ensembl transcript database file

Possible values: string | Optional: yes (if not specified, default transcript database will be used)

@ensembl = /opt/ngstools/annotations/cava/cava-v1.2.0/CAVA-master/defaultdb/ensembl75s.gz

Absolute path to dbSNP database file

Possible values: string | Optional: yes

@dbsnp=/opt/ngstools/annotations/cava/cava-v1.2.0/CAVA-master/dbSNP149.gz

Are variants with neither transcript nor dbSNP annotation to be included in the output?

Possible values: TRUE or FALSE | Optional: yes | Default value: TRUE

@nonannot = TRUE

Are only records with PASS filter value included in the output?

Possible values: TRUE or FALSE | Optional: yes | Default value: FALSE

@filter = FALSE

Types of variants to be annotated and outputted

Possible values: ALL, SUBSTITUTION, INDEL, INSERTION, DELETION or COMPLEX | Optional: yes | Default value: ALL

@type = ALL

Name of compressed BED file specifying genomic regions variant annotation is restricted to

Possible values: string | Optional: yes

@target = .

Name of file providing a list of the gene identifiers variant annotation is restricted to

Note: gene identifiers need to be given on separate lines in the file

Possible values: string | Optional: yes

@genelist = .

Name of file providing a list of transcript identifiers variant annotation is restricted to

Note: transcript identifiers need to be given on separate lines in the file

Possible values: string | Optional: yes

@transcriptlist = .

Name of file providing a list of the dbSNP identifiers variant annotation is restricted to

Note: dbSNP identifiers need to be given on separate lines in the file

Possible values: string | Optional: yes

@snplist = .

Is a log file to be written?

Possible values: TRUE or FALSE | Optional: yes | Default value: FALSE

@logfile = FALSE

Which ontology is used for reporting consequence type?

Possible values: CLASS, SO or BOTH | Optional: yes | Default value: BOTH

@ontology = BOTH

Definition of variant impact levels (reported by the IMPACT annotation flag)

Different impact levels are separated by | and a comma-separated list of CLASS terms must be given for each level

Possible values: string | Optional: yes

Default value: SG,ESS,FS | SS5,IM,SL,EE,IF,NSY | SY,SS,INT,5PU,3PU

@impactdef = SG,ESS,FS | SS5,IM,SL,EE,IF,NSY | SY,SS,INT,5PU,3PU

Are alternative annotations outputted?

If TRUE, the ALTANN and ALTCLASS/ALTSO annotation flags are reported instead of the ALTFLAG flag

Possible values: TRUE or FALSE | Optional: yes | Default value: TRUE

@givealt = TRUE

Number of bases into the intron used as the splice site region

Possible values: integer >= 6 | Optional: yes | Default value: 8

@ssrange = 8

Is the prefix CAVA_ added to annotation flag names in VCF output?

Possible values: TRUE or FALSE | Optional: yes | Default value: FALSE

@prefix = FALSE

COMMAND:

python /opt/ngstools/annotations/cava/cava-v1.2.0/CAVA-master/cava.py -c /opt/ngstools/annotations/cava/cava-v1.2.0/CAVA-master/config.txt -i FileForCAVAgroup.vcf -o FileForCAVAgroup.cava

LOG:

CAVA (Clinical Annotation of VAriants) v1.2.0 is now running. Started: 2016-12-21 12:51:18.514801

Configuration file: /opt/ngstools/annotations/cava/cava-v1.2.0/CAVA-master/config.txt Input file (VCF): FileForCAVAgroup.vcf Output file (VCF): FileForCAVAgroup.cava.vcf

Input file contains 1105851 records to annotate.

Annotating variants ... 0.3%Process SingleJob-1: Traceback (most recent call last): File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/opt/ngstools/annotations/cava/cava-v1.2.0/CAVA-master/cava.py", line 307, in run record.annotate(self.ensembl, self.dbsnp, self.reference, self.impactdir) File "/opt/ngstools/annotations/cava/cava-v1.2.0/CAVA-master/core.py", line 261, in annotate variant.annotate(ensembl, dbsnp, reference, impactdir) File "/opt/ngstools/annotations/cava/cava-v1.2.0/CAVA-master/core.py", line 86, in annotate if not ensembl is None: self = ensembl.annotate(self, reference, impactdir) File "/opt/ngstools/annotations/cava/cava-v1.2.0/CAVA-master/data.py", line 329, in annotate csn_plus, protchange_plus = csn.getAnnotation(variant_plus, transcript, reference, protein, mutprotein_plus) File "/opt/ngstools/annotations/cava/cava-v1.2.0/CAVA-master/csn.py", line 63, in getAnnotation dna, dna_ins = makeDNAannotation(variant, transcript, reference) TypeError: 'NoneType' object is not iterable

(Size of output file: 531.2 Kbyte)

CAVA (Clinical Annotation of VAriants) successfully finished. Ended: 2016-12-21 12:51:39.725113 Total runtime: 0:00:21.210312

vigneshravi commented 7 years ago

Hello Marton,

Following up on ths.

Thank You! Vignesh

On Wed, Dec 21, 2016 at 1:32 PM, Vignesh Ravichandran vigprasud@gmail.com wrote:

Hello Marton,

I am attaching a file with the command and log, FileForCAVA.vcf, config file. I use the human decoy genome hs37d5.fa and the transcript file ensembl75s.gz which comes with the cava package.

Thank you for looking into this! Vignesh Ravichandran

On Tue, Dec 20, 2016 at 10:26 AM, github-munz notifications@github.com wrote:

Dear Vignesh,

As you write, the error is probably not related to the particular variant you are reporting here. I have tried it and CAVA v1.2.0 returned no error for this input line, the variant was annotated successfully.

Could you please send your full input VCF file, configuration file as well as information on which reference genome and transcript database you were using to cava-user-group@googlegroups.com? I could then try to reproduce the error and find out what the real issue is.

Thanks! Best, Márton

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/RahmanTeam/CAVA/issues/2#issuecomment-268271352, or mute the thread https://github.com/notifications/unsubscribe-auth/AH25JhPSqztGnTJvHFipTXlXvxwpUelOks5rJ_OdgaJpZM4LRDK8 .

marton-munz commented 7 years ago

Hi Vignesh,

Could you please attach your FileForCAVA.vcf input file as well or send it to cava-user-group@googlegroups.com? Although you mentioned it, I can't see it here. I have tested your config setting and the hs37d5.fa genome and everything worked correctly with my test input so I would really need your original input file to be able to reproduce the problem.

Many thanks, Márton

vigneshravi commented 7 years ago

I sent it with last email as a google drive link since its a huge file.

On Tue, Jan 3, 2017 at 3:15 PM, github-munz notifications@github.com wrote:

Hi Vignesh,

Could you please attach your FileForCAVA.vcf input file as well or send it to cava-user-group@googlegroups.com? Although you mentioned it, I can't see it here. I have tested your config setting and the hs37d5.fa genome and everything worked correctly with my testr input so I would really need your original input file to be able to reproduce the problem.

Many thanks, Márton

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/RahmanTeam/CAVA/issues/2#issuecomment-270212741, or mute the thread https://github.com/notifications/unsubscribe-auth/AH25Js5EmT96nnAUUx_mni7UAckTdSMiks5rOqxmgaJpZM4LRDK8 .

marton-munz commented 7 years ago

Hi Vignesh,

Thank you for the input file. I was able to reproduce the error. It is caused by the fact that your input file contains VCF records with identical REF and ALT alleles. The first such input record which makes CAVA stop is the following: 1 1573139 . G G . . .

There are a total of 400 such records in your input file.

CAVA does not support VCF records of identical REF and ALT alleles as these do not describe real variants so one option is to remove these lines from your input.

Importantly, CAVA should give a more informative warning message in these cases instead of just failing at these lines so we are definitely going to add this feature in the next release. Thank you for driving our attention to it!

I hope it helps. Let me know if you have any further question. Best, Márton

vigneshravi commented 7 years ago

Thank You Marton, I will initiate a run without these 400 variants and get back to you.

Thank you for the help again! Vignesh

On Tue, Jan 3, 2017 at 4:35 PM, github-munz notifications@github.com wrote:

Hi Vignesh,

Thank you for the input file. I was able to reproduce the error. It is caused by the fact that your input file contains VCF records with identical REF and ALT alleles. The first such input record which makes CAVA stop is the following: 1 1573139 . G G . . .

There are a total of 400 such records in your input file.

CAVA does not support VCF records of identical REF and ALT alleles as these do not describe real variants so one option is to remove these lines from your input.

Importantly, CAVA should give a more informative warning message in these cases instead of just failing at these lines so we are definitely going to add this feature in the next release. Thank you for driving our attention to it!

I hope it helps. Let me know if you have any further question. Best, Márton

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/RahmanTeam/CAVA/issues/2#issuecomment-270231385, or mute the thread https://github.com/notifications/unsubscribe-auth/AH25JnApy47ZZp-Nj1g9-SIgM1pZLPAdks5rOr8NgaJpZM4LRDK8 .

vigneshravi commented 7 years ago

The run finished successfully after I removed the 400 entries. It worked!

Thank You Vignesh

On Wed, Jan 4, 2017 at 10:58 AM, Vignesh Ravichandran vigprasud@gmail.com wrote:

Thank You Marton, I will initiate a run without these 400 variants and get back to you.

Thank you for the help again! Vignesh

On Tue, Jan 3, 2017 at 4:35 PM, github-munz notifications@github.com wrote:

Hi Vignesh,

Thank you for the input file. I was able to reproduce the error. It is caused by the fact that your input file contains VCF records with identical REF and ALT alleles. The first such input record which makes CAVA stop is the following: 1 1573139 . G G . . .

There are a total of 400 such records in your input file.

CAVA does not support VCF records of identical REF and ALT alleles as these do not describe real variants so one option is to remove these lines from your input.

Importantly, CAVA should give a more informative warning message in these cases instead of just failing at these lines so we are definitely going to add this feature in the next release. Thank you for driving our attention to it!

I hope it helps. Let me know if you have any further question. Best, Márton

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/RahmanTeam/CAVA/issues/2#issuecomment-270231385, or mute the thread https://github.com/notifications/unsubscribe-auth/AH25JnApy47ZZp-Nj1g9-SIgM1pZLPAdks5rOr8NgaJpZM4LRDK8 .

marton-munz commented 7 years ago

Hi Vignesh, Great - thank you for your feedback! I'm closing this Issue now. Best wishes, Márton

RahmanTeam / CAVA

makeDNAannotation - NoneType object #2

This is a template of the configuration file for CAVA v1.2.0

Input file format

Possible values: VCF or TXT | Optional: yes | Default value: VCF

Output file format

Possible values: VCF or TSV | Optional: yes | Default value: VCF

Absolute path to reference genome file

Possible values: string | Optional: no

Absolute path to Ensembl transcript database file

Possible values: string | Optional: yes (if not specified, default transcript database will be used)

Absolute path to dbSNP database file

Possible values: string | Optional: yes

Are variants with neither transcript nor dbSNP annotation to be included in the output?

Possible values: TRUE or FALSE | Optional: yes | Default value: TRUE

Are only records with PASS filter value included in the output?

Possible values: TRUE or FALSE | Optional: yes | Default value: FALSE

Types of variants to be annotated and outputted

Possible values: ALL, SUBSTITUTION, INDEL, INSERTION, DELETION or COMPLEX | Optional: yes | Default value: ALL

Name of compressed BED file specifying genomic regions variant annotation is restricted to

Possible values: string | Optional: yes

Name of file providing a list of the gene identifiers variant annotation is restricted to

Note: gene identifiers need to be given on separate lines in the file

Possible values: string | Optional: yes

Name of file providing a list of transcript identifiers variant annotation is restricted to

Note: transcript identifiers need to be given on separate lines in the file

Possible values: string | Optional: yes

Name of file providing a list of the dbSNP identifiers variant annotation is restricted to

Note: dbSNP identifiers need to be given on separate lines in the file

Possible values: string | Optional: yes

Is a log file to be written?

Possible values: TRUE or FALSE | Optional: yes | Default value: FALSE

Which ontology is used for reporting consequence type?

Possible values: CLASS, SO or BOTH | Optional: yes | Default value: BOTH

Definition of variant impact levels (reported by the IMPACT annotation flag)

Different impact levels are separated by | and a comma-separated list of CLASS terms must be given for each level

Possible values: string | Optional: yes

Default value: SG,ESS,FS | SS5,IM,SL,EE,IF,NSY | SY,SS,INT,5PU,3PU

Are alternative annotations outputted?

If TRUE, the ALTANN and ALTCLASS/ALTSO annotation flags are reported instead of the ALTFLAG flag

Possible values: TRUE or FALSE | Optional: yes | Default value: TRUE

Number of bases into the intron used as the splice site region

Possible values: integer >= 6 | Optional: yes | Default value: 8

Is the prefix CAVA_ added to annotation flag names in VCF output?

Possible values: TRUE or FALSE | Optional: yes | Default value: FALSE

LOG:

CAVA (Clinical Annotation of VAriants) successfully finished. Ended: 2016-12-21 12:51:39.725113 Total runtime: 0:00:21.210312