StingraySoftware / stingray

Anything can happen in the next half hour (including spectral timing made easy)!
https://stingray.science/stingray
MIT License
172 stars 141 forks source link

Epoch folding with large datasets #805

Open rodrigruiz opened 7 months ago

rodrigruiz commented 7 months ago

Hi, We are using Stingray to analyse time data, and we are interested on performing an epoch folding search with a large dataset (much larger than what can be fit in the RAM of the computer). Our strategy would be to split the dataset into smaller subsets, fold each of them using the fold_events function, sum the folded profiles, and then perform the analysis on the sum of the folded profiles as it is done in the epoch_folding_search function. However, we think that this is not possible because some of the functions called from within epoch_folding_search are not public (for example, _folding_search or _profile_fast).
A solution would be to make these functions public, but maybe there's a better strategy to perform this analysis without modifying the Stingray software that we didn't think about. Any advice from your side would be very appreciated. Thanks!

matteobachetti commented 7 months ago

Hi @rodrigruiz, thanks for the Issue. First of all... having private functions doesn't mean they can't be used. You can import them and use them without issue! Being private usually means that they serve a very specific role (like, optimize a minor part of a computation which would make the code unreadable if in the user-facing function), and they are less documented and more prone to interface changes over time. But if you know what you're doing, importing private functions to solve specific tasks is something that can be done. I do that all the time :).

Another thing you might try for your specific need is passing the data with memory mapped arrays. I never tried those particular functions with memory maps, but they should work.

Note also that HENDRICS has an algorithm for fast frequency/fdot searches (it's called quasi fast-folding algorithm). It can be used through HENzsearch or directly from the API

rodrigruiz commented 7 months ago

Hi @matteobachetti , thanks so much for the detailed answer and the advice! For now we will try using the private functions, as this is the easiest solution for us given our time constraints. The other solutions that you propose sound interesting as well and since we plan to continue doing time series analyses, I will definitely look into the mapped arrays and other alternative algorithms. But for now we will focus on understanding the details of the epoch folding as it is implemented in stingray.

matteobachetti commented 7 months ago

BTW, if you think that a private function should really be made public because it's useful on its own, feel free to let us know!

rodrigruiz commented 7 months ago

Sure, thanks! So far, the plan to use these private functions seems to work out well. I will come back to in two or three weeks and let you know what we will have exactly used, and how it worked out.