UCBerkeleySETI / hyperseti

A SETI / technosignature search code to find intelligent life beyond Earth
https://hyperseti.readthedocs.io
10 stars 4 forks source link

find_et not in the same ballpark as turbo_seti with a Voyager 1 HDF5 file #51

Closed texadactyl closed 2 years ago

texadactyl commented 2 years ago

It might be that my test program assumptions are dodgey.

# Change the following line to make a file search work.
FILEPATH= "/home/texadactyl/hyperseti/test/test_data/Voyager1.single_coarse.fine_res.h5"

# Other parameters:
GULP_SIZE = 1048576
MAX_DRIFT_RATE = 4.0
MIN_DRIFT_RATE = 0.01
SNR_THRESHOLD = 30.0
N_BOXCAR = 6
GPU_ID = 0

import os
from turbo_seti import FindDoppler
from hyperseti import find_et

print("\nturbo_seti FindDoppler+search from file {} .....".format(FILEPATH))
fd = FindDoppler(datafile=FILEPATH,
                 max_drift=MAX_DRIFT_RATE,
                 min_drift=MIN_DRIFT_RATE,
                 snr=SNR_THRESHOLD,
                 n_coarse_chan=1,
                 gpu_backend=True,
                 gpu_id=GPU_ID,
                 out_dir=".")
fd.search()
cmd = "echo; cat Voyager1.single_coarse.fine_res.dat"
os.system(cmd)

print("\nhyperseti find_et from file {} .....".format(FILEPATH))
dframe = find_et(FILEPATH, 
                 filename_out='./hyperseti_hits.csv', 
                 gulp_size=GULP_SIZE, 
                 max_dd=MAX_DRIFT_RATE, 
                 min_dd=MIN_DRIFT_RATE,
                 n_boxcar=N_BOXCAR,
                 threshold=SNR_THRESHOLD)
print("Returned dataframe:", dframe

turbo_seti output .dat file after 3 seconds:

# -------------------------- o --------------------------
# File ID: Voyager1.single_coarse.fine_res.h5 
# -------------------------- o --------------------------
# Source:Voyager1
# MJD: 57650.782094907408   RA: 17h10m03.984s   DEC: 12d10m58.8s
# DELTAT:  18.253611    DELTAF(Hz):  -2.793968  max_drift_rate:   4.000000  obs_length: 292.057776
# --------------------------
# Top_Hit_#     Drift_Rate  SNR     Uncorrected_Frequency   Corrected_Frequency     Index   freq_start  freq_end    SEFD    SEFD_freq   Coarse_Channel_Number   Full_number_of_hits     
# --------------------------
001  -0.392226   30.612333     8419.319368     8419.319368  739933     8419.321003     8419.317740  0.0       0.000000  0   578 
002  -0.373093  245.709610     8419.297028     8419.297028  747929     8419.298662     8419.295399  0.0       0.000000  0   578 
003  -0.392226   31.220858     8419.274374     8419.274374  756037     8419.276009     8419.272745  0.0       0.000000  0   578 

hyperseti find_et output dataframe after 24 seconds:

drift_rate      f_start          snr  driftrate_idx  channel_idx  boxcar_size  beam_idx  n_integration
0     0.009566  8419.921874  1023.354614            0.0     524288.0          1.0       0.0           16.0
12    0.229596  8419.921812    90.550430           23.0     524310.0          2.0       0.0           16.0
21    0.516590  8419.921739    73.965897           53.0     524336.0          4.0       0.0           16.0
22    0.679221  8419.921698    64.074081           70.0     524351.0          4.0       0.0           16.0
13    0.382660  8419.921773    64.054428           39.0     524324.0          2.0       0.0           16.0
1     0.315694  8419.921829    64.050980           32.0     524304.0          1.0       0.0           16.0
3     0.286995  8419.921801    64.043777           29.0     524314.0          1.0       0.0           16.0
23    1.387141  8419.921488    45.335136          144.0     524426.0          4.0       0.0           16.0
14    0.784452  8419.921670    45.308163           81.0     524361.0          2.0       0.0           16.0
24    1.540205  8419.921446    42.750835          160.0     524441.0          4.0       0.0           16.0
15    0.927950  8419.921633    40.531353           96.0     524374.0          2.0       0.0           16.0
25    1.808067  8419.921368    38.674847          188.0     524469.0          4.0       0.0           16.0
26    1.961130  8419.921323    37.046047          204.0     524485.0          4.0       0.0           16.0
16    1.033181  8419.921603    37.006523          107.0     524385.0          2.0       0.0           16.0
8     0.621822  8419.921714    36.983360           64.0     524345.0          1.0       0.0           16.0
27    2.133327  8419.921276    35.595764          222.0     524502.0          4.0       0.0           16.0
17    1.205378  8419.921536    34.255932          125.0     524409.0          2.0       0.0           16.0
28    2.764716  8419.921100    31.124687          288.0     524565.0          4.0       0.0           16.0
texadactyl commented 2 years ago

I added some print statements in run_pipeline and dedoppler. It does not look like a simple case of metadata mixup. essai.log

Observations:

Need a "blank_dc" function?

image

texadactyl commented 2 years ago

So, I used watutil to extract the Voyager .h5 file to include only those frequencies around the true signal:

--- File Info ---
DIMENSION_LABELS : [b'frequency' b'feed_id' b'time']
        az_start :                              0.0
       data_type :                                1
            fch1 :            8419.500001240522 MHz
            foff :      -2.7939677238464355e-06 MHz
           ibeam :                                1
      machine_id :                               20
          nbeams :                                1
           nbits :                               32
          nchans :                           107375
            nifs :                                1
     rawdatafile : guppi_57650_67573_Voyager1_0002.0000.raw
     source_name :                         Voyager1
         src_dej :                       12:10:58.8
         src_raj :                     17:10:03.984
    telescope_id :                                6
           tsamp :                     18.253611008
   tstart (ISOT) :          2016-09-19T18:46:13.000
    tstart (MJD) :                57650.78209490741
        za_start :                              0.0

Num ints in file :                               16
      File shape :                  (16, 1, 107375)
--- Selection Info ---
Data selection shape :                  (16, 1, 107375)
Minimum freq (MHz) :                8419.200001750141
Maximum freq (MHz) :                8419.500001240522

Better performance without the huge DC spike essai.log

Returned dataframe:

 drift_rate      f_start        snr  ...  boxcar_size  beam_idx  n_integration
10    0.009566  8419.296983  64.941223  ...         32.0       0.0           16.0
11    1.090580  8419.274327  58.821793  ...         32.0       0.0           16.0
12    1.090580  8419.319321  58.065971  ...         32.0       0.0           16.0
9     2.621218  8419.273927  34.954056  ...         16.0       0.0           16.0

And faster: TOTAL TIME: 3.42s (roughly, the same as turbo_seti using GPU)

The frequencies seem to line up with turbo_seti. Find_et even found a candidate that turbo_seti missed. But, the SNR values and drift rates still don't look correct.

# Top_Hit_#     Drift_Rate  SNR     Uncorrected_Frequency   Corrected_Frequency     Index   freq_start  freq_end    SEFD    SEFD_freq   Coarse_Channel_Number   Full_number_of_hits     
# --------------------------
001  -0.392226   30.612333     8419.319368     8419.319368  739933     8419.321003     8419.317740  0.0       0.000000  0   578 
002  -0.373093  245.709610     8419.297028     8419.297028  747929     8419.298662     8419.295399  0.0       0.000000  0   578 
003  -0.392226   31.220858     8419.274374     8419.274374  756037     8419.276009     8419.272745  0.0       0.000000  0   578 

Comparison: image

comparison.xlsx

texadactyl commented 2 years ago

@telegraphic : I saw your fixes which make sense to me. Just tried out the new repo image. df.pdf

Hyperseti now needs somehow to throw out the non-highlighted (green) hit entries.