asoltis / MutEnricher

Somatic coding and non-coding mutation enrichment analysis for tumor WGS data
Other
9 stars 3 forks source link

ZeroDivisionError: division by zero #4

Closed alhafidzhamdan closed 3 years ago

alhafidzhamdan commented 3 years ago

Hi there,

I tried to create a covariate file using a processed bed file from GencodeV38, using the command below and I got this error. Could you please help me troubleshoot?

python ../utilities/get_region_covariates.py ../../Elements/Annotations/fiveprimeGENCODEv38.bed ../../Elements/Genome/hg38.fa --interval-files covariates/interval_files.txt -p 12 -o fiveprime_covariates.txt
Loading regions...
Loaded 91302 regions from input BED file.
Traceback (most recent call last):
  File "../utilities/get_region_covariates.py", line 322, in <module>
    if __name__ == '__main__': main()
  File "../utilities/get_region_covariates.py", line 104, in main
    r.get_seq_gc_cont()
  File "../utilities/get_region_covariates.py", line 289, in get_seq_gc_cont
    GCcont = (numC+numG) / tot
ZeroDivisionError: division by zero
head ../../Elements/Annotations/fiveprimeGENCODEv38.bed
chr1    65419   65433   OR4F5
chr1    450740  450742  OR4F29
chr1    685679  685718  OR4F16
chr1    686655  686673  OR4F16
chr1    923923  924431  SAMD11
chr1    923923  924431  SAMD11
chr1    925150  925189  SAMD11
chr1    925731  925800  SAMD11
chr1    959241  959256  NOC2L
chr1    960584  960693  KLHL17

I'm not sure what's causing this error- happy to provide you with the bed file if you think it'd be useful.

A

asoltis commented 3 years ago

Hello,

This error is very likely being thrown because the length of one of your regions is zero in the BED file ("tot" = 0). From the head of your input file, I see one region that is only 1 bp in length (ORF29 region in line 2), thus I would check if you have any regions in your input that are zero length. I would also consider removing very short regions from your input (e.g. < 10-50 bp) as the statistical methods may be unreliable for such short windows.

Let me know if this helps. Feel free to send your input file to me as well if you are having trouble identifying the source of the issue.

alhafidzhamdan commented 3 years ago

Hi there, Thanks for getting back! Yes it seems to work- i've excluded regions <50bp. I have encountered another error while trying to use a blacklist bed file. Here are my commands and the error:

python ../mutEnricher.py coding ../../Elements/Annotations/cds_mutenricher_hg38.gtf vcf_files.txt --use-local -c covariates/gene_covariates.txt --anno-type SnpEff -p 12 --gene-field gene_name -o coding --blacklist ../../Elements/Annotations/Blacklisted_SSMs.tsv 

--------------------------MUTENRICHER CODING--------------------------
MutEnricher version: 1.3.2

----------------------------INITIALIZATION----------------------------
Output directory for results: coding
Analysis prefix: mutation_enrichment_
Statistical testing type: nsamples
Considering all variants in background rate calculations.
Annotation type: SnpEff
Considering both SNPs and indels in analysis.
  --use-local selected with covariates provided. Local backgrounds will be considered in covariate cluster rate calculations.
Set pool with 12 processors

-----------------------------LOADING GENES----------------------------
Loading GTF...
  Deleting 31 genes annotated to multiple chromosomes.
GTF loaded.
Loading genes...
Loaded 20265 genes from input GTF file.

Loading blacklist variants file...
Traceback (most recent call last):
  File "../mutEnricher.py", line 222, in <module>
    if __name__ == '__main__': main()
  File "../mutEnricher.py", line 46, in main
    run(parser, args, version)
  File "/Non-coding/MutEnricher/coding_enrichment.py", line 258, in run
    blacklist = load_blacklist(bl_fn)
NameError: name 'load_blacklist' is not defined

I could not find a def line for load_blacklist in the coding_enrichment.py.

A

asoltis commented 3 years ago

Glad to hear the first issue is resolved.

Thank you for pointing out the issue with the blacklist file parsing - I had not included this function in the coding enrichment code as I rarely use this option. I have now included it in the updated version of the overall code (version 1.3.3) and verified that it was working on a test set. Please give the update a try and let me know if you encounter any other issues.

alhafidzhamdan commented 3 years ago

That works! Thanks for your support! Very much appreciate it! A