tcassanelli commented 2 years ago

The purpose of this feature is to extract the width per frequency channel, giving an start and stop timestamp. Then this information will be used within the VLBI pipeline and applied to the outrigger sites. Applying this custom mask to each data set will reduce the noise and improve the correlation strength. The idea is to run fitburst instantly once an event has been triggered and received by the outrigger sites, optimize DM and share these set of parameters to the outriggers sites (or perhaps a common site?). Then apply mask and correlate.

[x] Run fitburst with baseband data (@ketansand branch)
[x] Currently the software only returns a single width parameter, find and extract the full matrix of widths params per channel
[ ] Define width per frequency given a central timestamp (per frequency channel) and apply the geometric delay correction for the outrigger sites
[ ] Test first with PSR B0531+21 and older data sets, check whether the correlation strength improves
[ ] Test with the VLBI paper FRB, check whether the correlation strength improves
[ ] Integrate this feature in the main fitburst package
[ ] Integrate this feature in the future VLBI pipeline and automate the process

SebastianManosalva commented 1 year ago

Right now I am using the crab pulsar to test fitburst. The code for the individual plots is: `import matplotlib.pyplot as plt %matplotlib inline

plt.imshow( spectrum_downsampled[idx_good_freq], origin = "lower", aspect="auto", interpolation="nearest", extent=[min_time, max_time, min_freq, max_freq]) plt.xlabel("Time [ms]") plt.ylabel("frequency [MHz]")`

The original signal: crab pulse original

The bestfit model: bestfit model crab pulse

SebastianManosalva commented 1 year ago

To extract the initial time from the h5 file I am using:

`from astropy.time import Time data = BBData.from_file(path) data_time = data["time0"]

start_time = Time( val=data_time[:]['ctime'], val2=data_time[:]['ctime_offset'], format="unix", precision=9 )`

For the final time an I_cut value is used to know where the pulse ends.

`from sklearn import preprocessing import numpy as np

fit_copy = np.copy(bestfit) #where bestfit is the model obtained from fitburst

normalize

fit_copy = preprocessing.normalize(fit_copy)

I_cut = 1e-4 #this value was used as an example pulse_time =[] pulso = np.greater(fit_copy, I_cut) final_time = [] for j in range(bestfit.shape[0]):

for i in range(len(bestfit[j])):
    if pulso[j][i] == True:
        pulse_time.append(times_windowed[i])
final_time.append(pulse_time[-1])

now with the start and final time the width can be obtained `

SebastianManosalva commented 1 year ago

Here I show the functions that allow us to calculate the width, start time of the pulse and final time of the pulse per frequency channel:

`def t_dd(DM, t_0, freq_0, freq_1): k = 4.148e3 t_delta = k DM (1/(freq_0)-1/(freq_1)) t_1 = t_delta + t_0 return t_1

def get_initial_and_final_time(bestfit, fname, path, ratio): """ input:

bestfit The bestfit_model from run_fitburst
fname name of the .npz file from make_input
path path of the .npz file from make_input
ratio percentage that decides where to cut the signal, for now its a float value output:
initial_time in seconds
final_time in seconds """

get the data from the npz file

fname = "fitburst_crabpulse.npz" path = "/home/smanosal/documentos" data = DataReader(fname, data_location=path) data.load_data()

fit_copy = np.copy(bestfit) fit_copy = preprocessing.normalize(fit_copy)

Get the time frames where the pulse ends and start

frames = data.times/data.res_time # time frames pulso = np.greater(fit_copy, ratio) # Bool values that indicate where is the pulse acceptable_pulse = []
n_initial = np.zeros(bestfit.shape[0]) # number of frames until the first point of the pulse (x 1024 feeds) n_final = np.zeros(bestfit.shape[0]) # number of frames until the last point of the pulse (x 1024 feeds)

for j in range(bestfit.shape[0]): #1024 freq channels
```
acceptable_pulse.append(frames[pulso[j]])  
n_initial[j] = acceptable_pulse[j][0]     # First value of the pulse (in frames)
n_final[j] = acceptable_pulse[j][-1]      # Last value of the pulse (in frames)
```
initial_time = n_initialdata.res_time # Initial time in seconds final_time = n_finaldata.res_time # Final time in seconds

return initial_time, final_time

def get_start_time(h5_file, initial_time, final_time, DM): """ input:

h5_file the path of the h5 file
initial_time from get_initial_and_final_time
final_time from get_initial_and_final_time
DM of the event output:
startTime astropy.Time object with the date and time of the start of the pulse
finalTime astropy.Time object with the date and time of the final of the pulse
width of the pulse per frequency channel """ data_h5 = h5py.File(h5_file, 'r')

Generate the astropy.Time object

data_time = data_h5["time0"] start_time = Time( #start_time form each freq channel val=data_time[:]['ctime'], val2=data_time[:]['ctime_offset'], format="unix", precision=9 ) freq_idx = data_h5["index_map"]["freq"][:]["id"].astype(int) frequencies = data_h5["index_map"]["freq"][:]["centre"].astype(float) #* apu.MHz

zero1 = np.zeros(len(initial_time)) #1024 zero2 = np.zeros(len(initial_time))

for i in range(len(start_time.jd)): zero1[freq_idx-1] = start_time.jd1[i] #REVISAR los indices! zero2[freq_idx-1] = start_time.jd2[i] complete_start_time = zero1 + zero2 #complete with zeros start_time

F_channel = np.linspace(min(frequencies), max(frequencies), 1024)

for i in range(1024): if complete_start_time[i] == 0: complete_start_time[i] = t_dd(DM, complete_start_time[i-1], F_channel[i-1], F_channel[i]) #start time for all the 1024 feeds

t = Time(complete_start_time, format='jd', precision=9) #to astropy.time format finalform_isot = t.isot #In order to work with initial_time we should work with isot finalform_seconds = np.zeros(1024) finalform_date = [] for i in range(1024): finalform_seconds[i] = float(finalform_isot[i][17:]) #seconds in float finalform_date.append(finalform_isot[i][:17]) #date and time in str

t_0 = initial_time + finalform_seconds # 1024x1 in seconds numpy array

t_f = final_time + finalform_seconds # 1024x1 in seconds

width = t_f - t_0 #The width of the pulse per freq channel

start_isot =[] finish_isot =[] for i in range(1024): start_isot.append(finalform_date[i] + str(t_0[i]))
finish_isot.append(finalform_date[i] + str(t_f[i]))

startTime = Time(np.array(start_isot), format = "isot", precision=9) #initial_time of the pulse in astropy.Time format finishTime = Time(np.array(finish_isot), format = "isot", precision=9) #final_time of the pulse in astropy.Time format

return width, startTime, finishTime

def get_widths(bestfit, fname, path, ratio, h5_file, DM): initial_time, final_time = get_initial_and_final_time(bestfit, fname, path, ratio) width, startTime, finishTime = get_start_time(h5_file, initial_time, final_time, DM)

#maybe I can save some files in this function or some plots
return width, startTime, finishTime`

SebastianManosalva commented 1 year ago

Here is how I use the get_widths function:

`import numpy as np import h5py import matplotlib.pyplot as plt %matplotlib inline import h5py from astropy.time import Time from astropy import units as apu from sklearn import preprocessing from fitburst.backend.generic import DataReader

fit = np.load("bestfit_fitburst_crabpulse.npz")

fit.files

bestfit = fit["arr_0"] fname = "fitburst_crabpulse.npz" path = "/home/smanosal/documentos" ratio = 5*10e-3 h5_file = "/data/user-data/cassanelli/Cs/chime/singlebeam_172516423_gain_20210603T130623.h5" DM = 56.755

width, startTime, finishTime = get_widths(bestfit, fname, path, ratio, h5_file, DM)`

SebastianManosalva commented 1 year ago

And this is a plot using the initial_time and final_time from get_initial_and_final_time

Width crab pulse

tcassanelli commented 1 year ago

@SebastianManosalva good result. First thing is, I'd like to know the start_time of the file and the number of frames for the position of the red and orange lines, if we have that then we are good. Can you test a bit more with the algorithm to select more of the pulse at lower frequencies? I see that I good portion of the signal is being lost at the end. Also can you plot those lines in the actual data? I think that will tell us whether your method is working or not.

SebastianManosalva commented 1 year ago

Here is what I have done since the last update: I created a function that encapsulates all the processes of fitburst and my own code, the function receive the .npz file, a ratio and a save path. It returns the right and left limits of the pulse, the start_time per frequency channel (astropy.Time), the DM and the TOA. But I am not sure if the TOA 's are correct, because they are far away from the pulse: In green is the TOA's and orange/red are the limits. I am looking what could be the issue.

tcassanelli commented 1 year ago

hmm I doubt that all the top section was taken out. I think it is a plotting error, since those missing channels should be spread out across the entire band and not just at the top!

SebastianManosalva commented 1 year ago

The first funtion of make_input is called cut_profile it returns the frequency channels with the name freq and this frequencies goes from 400.390625 to 656.25, the other channels are then filled with zeros. So it isn't a plotting error.

SebastianManosalva commented 1 year ago

I had been testing the code with a new event singlebeam_173568547 and with different arrival times:

First the TOA that I get from make_input
Second the average time between the two limits:
Third the TOA from baseband_analysis/analysis/toa/get_TOA
Fourth the fixed TOA from run_fitburst (this one has two values)

As I confirmed with Ziggy and Ketan the arrival time is always calculated for the 400.1953125 MHz band and between all the arrival times the last one looks really good! But the code changed just a bit compared to the last time, so maybe the singlebeam_173098290 fit had something strange ...

SebastianManosalva commented 1 year ago

I have been trying different codes to get make_input to work as intended. The next function improves the data (coherent and incoherent dedispersion) and refines the DM.

` from baseband_analysis.core.bbdata import BBData from baseband_analysis.core.signal import _get_profile, get_floor, get_main_peak_lim, tiedbeam_baseband_to_power from baseband_analysis.analysis.snr import get_snr from fitburst.routines.profile import get_signal, count_components

def simplified_cut_profile(path, DM, t_res, DM_range = 1, w = None, downsample = 1, refine_RFI = True, fill_missing_time = None, fill_nan = True, spectrum_lim = False, diagnostic_plots = False, valid_channels=None):

data = BBData.from_file(path)
#This function gets the power data in case the original data doesn't have it.
tiedbeam_baseband_to_power(data=data, dm=DM, time_downsample_factor=1, dedisperse=True, time_shift=False) 
#Optimize the DM, incoherent dedisp
freq_id, freq, power, offset, weight, valid_channels, time_range, DM, downsampling_factor = get_snr(data=data, DM=DM, diagnostic_plots=diagnostic_plots, w=w, valid_channels=valid_channels, spectrum_lim=spectrum_lim, DM_range=DM_range, downsample=downsample, refine_RFI=refine_RFI, fill_missing_time=fill_missing_time, fill_nan=fill_nan,return_full=True)
# get the two dimensional profile and cut the data 
profile = _get_profile(power)
start, end = get_signal(profile, ds = downsampling_factor)
# Find peaks in the profile
peaks = count_components(profile, t_res, ds = downsampling_factor, diagnostic_plots=diagnostic_plots)
peak_times = peaks * t_res
DM_err = 0.05/2

return data, freq_id, freq, power[...,start:end], valid_channels, DM, DM_err, downsampling_factor, profile, t_res, peak_times, start, power

`

tcassanelli commented 1 year ago

ok, can you please post and update with a figure as well.

CHIMEFRB / fitburst

Extracting width per frequency channel (VLBI) #64

normalize

now with the start and final time the width can be obtained `

get the data from the npz file

Get the time frames where the pulse ends and start

Generate the astropy.Time object

fit.files