Genotek / ClassifyCNV

ClassifyCNV: a tool for clinical annotation of copy-number variants
Other
60 stars 13 forks source link

Division by zero error wihtout duplicate rows #8

Closed jordimaggi closed 2 years ago

jordimaggi commented 2 years ago

Dear ClassifyCNV team members,

I am currently trying to annotate a set of 550'000 CNVs using ClassifyCNV. However, the analysis fails with the following error message:

Traceback (most recent call last):
  File "/media/analyst/Data/Scripts/ClassifyCNV-master/ClassifyCNV.py", line 834, in <module>
    analyze_pop_freqs()
  File "/media/analyst/Data/Scripts/ClassifyCNV-master/ClassifyCNV.py", line 737, in analyze_pop_freqs
    overlap_perc = int(fields[9]) * 100 / (int(fields[2]) - int(fields[1]))
ZeroDivisionError: division by zero

I double-checked the file for duplicate rows (same CHR, START, and END), but pandas reports no duplicates. Do you have any idea as to what may be going on?

tgurbich commented 2 years ago

Hi jordimaggi,

This error can happen if your input file has CNVs where the start and the end position of a CNV are the same (so a CNV has 0 length). Unfortunately, ClassifyCNV cannot analyze such intervals. It expects that if there is a duplication or a deletion, the beginning and the end coordinates would indicate the entire span of a duplicated or deleted region.

Depending on your goals, I'd recommend that you pre-filter your input file to remove CNVs with zero length and rerun the tool or use a different tool that can take zero length CNVs.

Hope this helps. Please let me know if you have any further questions.

Tanya

jordimaggi commented 2 years ago

Hi Tanya,

That's exactly what I thought it could be. I have started the classification again with the pre-filtered file and it just finished correctly. Thanks a lot!