bahlolab / superSTR

A lightweight, alignment-free utility for detecting repeat-containing reads in short-read WGS, WES and RNA-seq data.
GNU General Public License v2.0
17 stars 7 forks source link

RuntimeError in outliers.py #7

Closed wdecoster closed 2 years ago

wdecoster commented 3 years ago

Hi,

I cannot find a way to solve the error below:

Traceback (most recent call last):
  File "superSTR/Python/outliers.py", line 191, in <module>
    write_string = bootstrap_ci(df, info_score=args.iscore,
  File "superSTR/Python/outliers.py", line 44, in bootstrap_ci
    ci = bs.conf_int(get_quantile, extra_kwargs={"pc_val": user_pc},
  File "/home/wdecoster/miniconda3/envs/superstr_pp/lib/python3.8/site-packages/arch/bootstrap/base.py", line 888, in conf_int
    b = self._bca_bias()
  File "/home/wdecoster/miniconda3/envs/superstr_pp/lib/python3.8/site-packages/arch/bootstrap/base.py", line 953, in _bca_bias
    raise RuntimeError(
RuntimeError: Empirical probability used in bias correction is 0 or 1, and sobias cannot be corrected. This may occur in extremum statistics that are not well approximated by a normal in a finite sample.

My command is:

python superSTR/Python/outliers.py -i summary/ -o outliers.tsv -m manifest.tsv --bootstrapCI -is --max_motif 3 --min_len 75 --max_len 101 --controllab C

Please let me know how I can help to debug this.

Thanks, Wouter

lfearnley commented 3 years ago

Hi Wouter!

The error is occurring in the bias-correction step of the BCa bootstrap. The most-likely fix for this is to fail over to a bootstrap method that doesn't use such a correction (eg the basic or percentile bootstrap) and record this change, but I'd like to have a look at the bootstrap distributions before making that kind of a recommendation.

I've encountered this only in fairly narrow circumstances (in fact, only on synthetic data), so I'd be very interested in looking at your data for the specific motif that's failing - would it be possible to share that file? If not - are you able to tell me how many controls and how many cases are in your data, and a bit about the data you're analysing? Is this WGS, WES, RNA-seq?

Best,

Liam

wdecoster commented 3 years ago

Hi Liam,

It is 100bp RNA-seq, with 22 controls and various subgroups of 158 cases. I have no idea on which motif it is failing though. Can I turn on logging or debugging somewhere? The data is confidential, but depending on what you need I can look into sharing a minimal file just with you.

Wouter

lfearnley commented 3 years ago

Try running the analysis with this (https://gist.github.com/lfearnley/f3c78f6d18db72869d852a5b40ff27e8) in place of outliers.py - it should exit with code 1 after printing the filename and stack trace.

I've separately implemented the method that fails over to the basic/percentile bootstrap that I outlined last night, and have that testing over the weekend. I'd like to make sure that there's not anything strange going on with the bootstrap distribution though, as that might complicate matters further. In terms of what I'd need - I'd be after the information scores for your controls for the motif causing the issue.

If sharing that's ok with you, email might be best - I'm fearnley.l@wehi.edu.au

Thanks again for the detailed bug report! It's really appreciated.

wdecoster commented 2 years ago

That raises this, so I'll email you the offending file.

summary//motifs/2mers/AT.csv
Runtime error in detection library - Traceback (most recent call last):
  File "outliers2.py", line 44, in bootstrap_ci
    ci = bs.conf_int(get_quantile, extra_kwargs={"pc_val": user_pc},
  File "/home/wdecoster/miniconda3/envs/superstr_pp/lib/python3.8/site-packages/arch/bootstrap/base.py", line 888, in conf_int
    b = self._bca_bias()
  File "/home/wdecoster/miniconda3/envs/superstr_pp/lib/python3.8/site-packages/arch/bootstrap/base.py", line 953, in _bca_bias
    raise RuntimeError(
RuntimeError: Empirical probability used in bias correction is 0 or 1, and sobias cannot be corrected. This may occur in extremum statistics that are not well approximated by a normal in a finite sample.
lfearnley commented 2 years ago

This issue should have been fixed with the recent updates to the software. Updates to the manuals will follow shortly.

This is a case where the bias corrected bootstrap is failing due to encountering a non-normal bootstrap distribution; we've implemented the option to have superSTR fail over to using a percentile bootstrap and left it up to the user as to how to proceed in such cases.

Oh, and thanks Wouter for the lovely report, and for testing some of the interim fixes!