Create exception for ZeroDivisionError

FleurGaBru commented 6 months ago

Hello!

We found some unexpected behaviour in the PyBSAseq script. When setting a window size that is too small, meaning that there are windows that do not contain any SNPs, PyBSAseq fails with the following error:

It seems like unexpected behaviour that the pipeline fails, when windows contain no SNPs. Would it be possible for you to create an exception for this error and e.g. infer the value "0" for the ratio sSV / total SV in the effected windows?

Thanks a lot for your help in advance!

Kind regards, Fleur

dblhlx commented 6 months ago

I'll take a look and let you know soon. BTW, micromamba is much faster than conda.

dblhlx commented 6 months ago

Hi Fleur:

I will fix the script in a couple of day. Please let me know if you are in hurry.

Jianbo

dblhlx commented 5 months ago

Hi Fleur:

I just modified the script. Please test it and let me know if encounter any issue.

Thanks,

Jianbo

FleurGaBru commented 5 months ago

Thank you so much! I will test it asap and give you feedback.

FleurGaBru commented 3 months ago

The message that you closed the message reminded me that I promised feedback. Thank you! We tested pyBSAseq with several projects after you introduced the fix. For most projects it run fine. However, we found the following issues:

In sliding_windows.csv of some projects we see that some windows get a deltaAF value above the significance threshold but do not contain any SNPs. This is confusing and causes issues in our downstream analysis
We discovered a "blocky" peak pattern in the PyBSAseq plot that appears with smaller window sizes after you introduced the fix. We think that this might be related to the way how you impute the ratio value for windows without SNPs

We tried to dive into your code and we can see that you introduced an if/else statement in lines 1172-1175. Could you please explain how you define sub_l[2] and if this could cause the above described issues?

dblhlx commented 3 months ago

I'll take a look and get back to you soon.

Lines 1172-1175 are involved in calculating sliding window-specific thresholds, they shouldn't have other effects.

FleurGaBru commented 1 week ago

Any news on this issue? If you need any more information from me, please let me know. Happy to help.

dblhlx commented 1 week ago

Sorry for the late reply. I have been working on a grant proposal lately. We just submitted today.

I can reproduce the "blocky peaks" issue. I just removed SVs from a large genomic region to create many empty sliding windows in this region. The "blocky peaks" is caused by the way I deal with a stretch of empty sliding windows: I assume that adjacent sliding windows have very similar allele frequency, G-statistic value, and sSV/totalSV ratio. I just sign the values of these three parameter of the previous non-empty sliding window to the empty sliding window. I thought about other ways, e.g., gradually increase/decrease, but it is more complex to implement, so I used the simplest way in dealing with the situation.

However, I couldn't reproduce your first issue. It would be much easy if I can use your data. A sub-dataset containing only a single chromosome or a chromosomal segment of ~10 megabases with the peak in the middle should work fine. Please let me know if it's okay for you.

You can reopen this issue if you like.

dblhlx / PyBSASeq

Create exception for ZeroDivisionError #14