StingraySoftware / stingray

Anything can happen in the next half hour (including spectral timing made easy)!
https://stingray.science/stingray
MIT License
172 stars 141 forks source link

Fill bad time intervals with fake data #782

Closed matteobachetti closed 8 months ago

matteobachetti commented 10 months ago

~Depends on #754~

Changes can be seen in action in https://github.com/StingraySoftware/notebooks/pull/76

Also, resolve #612

pep8speaks commented 10 months ago

Hello @matteobachetti! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 2254:81: E203 whitespace before ':' Line 2255:60: E203 whitespace before ':' Line 2266:81: E203 whitespace before ':' Line 2267:60: E203 whitespace before ':'

Comment last updated at 2024-01-11 09:46:05 UTC
codecov[bot] commented 10 months ago

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (490a7c7) 96.31% compared to head (de2ee3e) 96.33%. Report is 5 commits behind head on main.

Files Patch % Lines
stingray/lightcurve.py 83.33% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #782 +/- ## ========================================== + Coverage 96.31% 96.33% +0.02% ========================================== Files 43 43 Lines 8497 8548 +51 ========================================== + Hits 8184 8235 +51 Misses 313 313 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

matteobachetti commented 9 months ago

@mgullik thanks for the thorough review, which I tried to cover in my new changes. There are a few remaining questions which I haven't answered yet, that probably need some discussion:

a lightcurve-like object, the filling-the-gap routine is easier to understand. We know the time resolution of the lightcurve and the routine creates time bins with a value of count / countrate.

In a StingrayTimeseries-like object, it is more complicated, how do you choose the arrival times that are used to fill the gap? From the docstring I read "Random data are extracted by randomly repeating the values of nearby good data" This is not extremely clear. If you repeat the values of the arrival time, you don't fill the empty gaps, do you?

I improved the docstring, explaining how uniformly and non-uniformly sampled data are treated differently. Basically, the only change is that times are assigned on a fixed grid for uniformly sampled, and randomized with the same countrate as in the buffer for non-uniformly sampled.

More importantly, we should find a way to test that using the function fill_bad_time_intervals on a StingrayTimeseries and then making a lightcurve is equivalent to using the function fill_bad_time_intervals on the lightcurve made from the original StingrayTimeseries (with the gaps). I guess the two cases can't lead to the same numbers, because of the random function, but they should be close enough.

Why do you think this is important? A uniformly sampled time series should behave just like a light curve, and tests in both time series and light curves all pass independently.

In a lightcurve-like object

  • what happens if the gaps are smaller than the dt?

The light curve machinery should cover this: bins partially outside GTIs are just treated as if they were outside GTIs.

  • can I fill only certain gaps and not others? For example, I want to fill the gaps in the first half of the light curve and not in the second half

I don't see a use case for this, what would be the application? I guess one could split the light curve, fill the GTIs in the first half, and then join the two chunks back together.

  • can we include the possibility of filling the gap with the linear interpolation between the two edges of GTIs adjacent to the BTI? Does it make sense?

We could. Again, what would be the use case? While using random data tries to preserve the statistical properties of the data set, the linear interpolation would knowingly alter that.

Additional tests:

  • test multiple gaps and not only one gap to fill

Done

  • some specific comments in the code

I think I addressed those one by one