abyzovlab / CNVpytor

a python extension of CNVnator -- a tool for CNV analysis from depth-of-coverage by mapped reads
MIT License
178 stars 26 forks source link

Error calculating dG in regions < 100. Potential Solution #217

Open tonynugroholic opened 5 months ago

tonynugroholic commented 5 months ago

I have encountered an error when attempting to genotype smaller regions (<100bp) obtained from another source which causes the program to stop. The error in question is as follows.

  File "/scale_wlg_nobackup/filesets/nobackup/nesi00187/tonug0/cnv_detection/CNVpytor/venv/bin/cnvpytor", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/scale_wlg_nobackup/filesets/nobackup/nesi00187/tonug0/cnv_detection/CNVpytor/venv/lib/python3.11/site-packages/cnvpytor/__main__.py", line 242, in main
    view.genotype_prompt(list(map(binsize_type, args.genotype)), all=args.all)
  File "/scale_wlg_nobackup/filesets/nobackup/nesi00187/tonug0/cnv_detection/CNVpytor/venv/lib/python3.11/site-packages/cnvpytor/viewer.py", line 4290, in genotype_prompt
    self.genotype_all(bin_sizes, [line], interactive=True)
  File "/scale_wlg_nobackup/filesets/nobackup/nesi00187/tonug0/cnv_detection/CNVpytor/venv/lib/python3.11/site-packages/cnvpytor/viewer.py", line 4153, in genotype_all
    dG = np.min(distN[sbin1:sbin2])
         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scale_wlg_nobackup/filesets/nobackup/nesi00187/tonug0/cnv_detection/CNVpytor/venv/lib/python3.11/site-packages/numpy/core/fromnumeric.py", line 2953, in min
    return _wrapreduction(a, np.minimum, 'min', axis, None, out,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scale_wlg_nobackup/filesets/nobackup/nesi00187/tonug0/cnv_detection/CNVpytor/venv/lib/python3.11/site-packages/numpy/core/fromnumeric.py", line 88, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: zero-size array to reduction operation minimum which has no identity

I have determined that this error occurs when the start and end of the provided region are within the same 100bp (e.g. 120-178 but not 180-257). This is caused sbin1 and sbin2 (viewer.py line 4124) resulting in the same number, thus dG = np.min(distN[sbin1:sbin2]) (viewer.py line 4150) returns an empty list for the np.min function. A potential solution for this is to check whether sbin1==sbin2, if so get distN at that position.

Potential change (viewer.py line 4148):

                    pN = 1 - pN / (pos2 - pos1 + 1)
                    if (sbin1 == sbin2):
                        dG = distN[sbin1]
                    else:
                        dG = np.min(distN[sbin1:sbin2])

However, this made me wonder about the case in which the region only intersects 2 bins (e.g. 180-257 so that sbin2 == sbin1 +1). In this case, distN[sbin1:sbin2] would only return the distance for sbin1 which doesn't sound correct as we would want to know the minimum distance out of sbin1 and sbin2. Therefore, would it be more suitable to add 1 to sbin2 so that the last bin is always included in the minimum distance check? This would also fix the issue for regions within the same 100bp without needed to check sbin1 == sbin2.

Potential change (viewer.py line 4148):

                    pN = 1 - pN / (pos2 - pos1 + 1)
                    dG = np.min(distN[sbin1:sbin2+1])

What would be the best solution to deal with these regions?