chime-frb-open-data / community

CHIME/FRB Community
MIT License
13 stars 1 forks source link

Unable to see bursts in data for "Periodic activity from a fast radio burst source" and "A Second Repeating Fast Radio Burst" #5

Closed mef51 closed 3 years ago

mef51 commented 4 years ago

Hi, I've downloaded the data for FRB 180916.J0158+65 from here but I'm unable to see the bursts as shown in Extended Data figure 1 of the paper following the code that is provided in the README of the data release. In addition the zapped channels due to RFI appear differently than those published.

Code provided by README:

import matplotlib.pyplot as plt
import numpy as np
fname = "burst_9_bb_1k_wfall.npz"
data = np.load(fname)
wfall = data["wfall"]
dt_s = data["dt_s"]
center_freq_mhz = data["center_freq_mhz"]
df_mhz = center_freq_mhz[1] - center_freq_mhz[0]
plt.imshow(wfall, origin="lower", aspect="auto", interpolation="nearest", 
           extent=(0, dt_s*wfall.shape[0], center_freq_mhz[0]-df/2., 
           center_freq_mhz[-1]+df/2.))
plt.xlabel("Time [s]")
plt.ylabel("Frequency [MHz]")

There are some small typos in the code provided (df should be df_mhz, and the time axis extent should be 0, dt_s*wfall.shape[1] not wfall.shape[0]) but with those fixed, the 25 waterfall plots i see are shown below. As you can see for example, Bursts 18 and 19, which are some of the more obvious bursts in the paper are completely obscured by zapped channels:

bursts

For reference the code for the above figure is:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import colors as mcolors
import os
from tqdm import tqdm

folder = 'data/CHIME_180916.J0158+65'
burstfiles = [s for s in os.listdir(folder) if '16k' in s]

plt.figure(figsize=(24,24))
cmap = plt.get_cmap('gray_r')
cmap.set_bad(color = 'w', alpha = 1.) # potentially hides important errors in the data!

for fname in burstfiles:
    data = np.load("{}/{}".format(folder, fname))
    wfall = data["wfall"]

    dt_s = data["dt_s"]
    center_freq_mhz = data["center_freq_mhz"]
    df_mhz = center_freq_mhz[1] - center_freq_mhz[0]

    plt.subplot(5,5,burstfiles.index(fname)+1)

    plt.title(fname)
    plt.imshow(wfall, origin="lower", aspect="auto", interpolation="nearest", cmap=cmap,
               extent=(0, 1000*dt_s*wfall.shape[1], center_freq_mhz[0]-df_mhz/2., center_freq_mhz[-1]+df_mhz/2.))
    plt.colorbar()
    plt.xlabel("Time [ms]")
    plt.ylabel("Frequency [MHz]")

plt.tight_layout()

Any help would be appreciated, thanks.

mef51 commented 4 years ago

As a workaround I've been extracting data from the pdf figure source that is on the arxiv listing with some success but this is not ideal..

zpleunis commented 4 years ago

Hi! Thank you for catching those typos! I will fix them in the README.

The bursts are too dim to see in individual narrow channels, that are 400/16,384 MHz wide. In the paper we downsampled the data in frequency by a factor 256 to only 64 frequency channels with 400/64 MHz bandwidth each so that the bursts are visible by eye in the dynamic spectra. We uploaded the waterfalls at the full resolution so that you can decide yourself what resolution works best for the analysis that you are interested in. I hope that helps!

-- Ziggy

mef51 commented 4 years ago

Hi Ziggy, thanks! I had tried downsampling but I realized after your message I was just decimating, so I tried it again and got it to work. Would it be possible to add an example of the downsampling to the README? I had to do some other minor gymnastics to handle the masked values when downsampling so I'll share what I did in case someone else has similar trouble:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import colors as mcolors
import os
import skimage.measure

folder = 'data/CHIME_180916.J0158+65'
burstfiles = [s for s in os.listdir(folder) if '16k' in s]

plt.figure(figsize=(24,24))
cmap = plt.get_cmap('gray_r')
cmap.set_bad(color = 'w', alpha = 1.)

for fname in burstfiles:
    data = np.load("{}/{}".format(folder, fname))
    wfall = data["wfall"]
    downsample = 256
    # nan_to_num before downsampling. note that this produces some streaks in the bursts..
    wfall_down = skimage.measure.block_reduce(np.nan_to_num(wfall), block_size=(downsample, 1), func=np.mean)
    wfall_masked = np.ma.masked_where(wfall_down == 0, wfall_down)
    # print(wfall_down.shape, wfall.shape)

    dt_s = data["dt_s"]
    center_freq_mhz = data["center_freq_mhz"]
    df_mhz = center_freq_mhz[1] - center_freq_mhz[0]
    # print(dt_s, center_freq_mhz[0], center_freq_mhz[-1], df_mhz)

    plt.subplot(5,5,burstfiles.index(fname)+1)
    plt.title(fname)
    plt.imshow(wfall_masked, origin="lower", aspect="auto", interpolation="nearest", cmap=cmap,
               extent=(0, 1000*dt_s*wfall.shape[1], center_freq_mhz[0]-df_mhz/2., center_freq_mhz[-1]+df_mhz/2.))
    plt.colorbar()
    plt.xlabel("Time [ms]")
    plt.ylabel("Frequency [MHz]")

plt.tight_layout()

which produces

bursts

shinybrar commented 4 years ago

@mef51

I would encourage you to add your script to the cfod package here via a Pull Request.

mef51 commented 4 years ago

Hello! Sorry to open this issue again, but I'm having trouble seeing the bursts from FRB180814.J0422+73 (from "A Second Repeating Fast Radio Burst", Figure 2), even after downsampling and using the weights read from cfod.

I've tried saturating the intensity scale, zooming in the time-axis, averaging the beams, etc..

This is what I'm seeing:

import numpy as np
import matplotlib.pyplot as plt
import os
import skimage.measure

folder = 'data/CHIME_FRB180814.J0422+73'
burstdirs = ['burst180814', 'burst180911', 'burst180919']
for burst in burstdirs:
    burstfiles = [s for s in os.listdir(folder +'/'+ burst) if '.npy' in s and 'weights' not in s]
    weightfiles = [s for s in os.listdir(folder +'/'+ burst) if '.npy' in s and 'weights' in s]

    plt.figure(figsize=(20,24))
    plt.suptitle(burst)
    cmap = plt.get_cmap('gray_r')
    cmap.set_bad(color = 'w', alpha = 1.)

    dataAveraged = []
    for fname, wfname in zip(burstfiles, weightfiles):
        try:
            del data
        except Exception:
            pass
        data    = np.load("{}/{}/{}".format(folder, burst, fname))
        weights = np.load("{}/{}/{}".format(folder, burst, wfname))

        tl, tr = 425, 575
        data    = data[:, :] 
        weights = weights[:, :]
        np.putmask(data, ~weights.astype(bool), np.nan)
        data = data - 1*data[:, 0:25].mean(axis=1)[:,None]

        downsample = 256
        data = skimage.measure.block_reduce(data, block_size=(downsample, 1), func=np.nanmean)

        if dataAveraged == []: 
            dataAveraged = data
        else:
            dataAveraged += data
        plt.subplot(4, 4, burstfiles.index(fname)+1)
        plt.title(burst + ": beam" + fname.split('.')[0].split('beam')[1])

        plt.imshow(data[:,:], origin="lower", aspect="auto", interpolation="nearest", cmap=cmap)
        plt.clim(-10, 10)

        plt.colorbar()
        plt.xlabel("Time [channel #]")
        plt.ylabel("Frequency [channel #]")

    plt.tight_layout()
    if burst == 'burst180814': plt.savefig('bursts180814.png')

bursts180814

Thanks, Mohammed

zpleunis commented 4 years ago

Hi Mohammed, it looks to me like there are a lot of frequency channels that are contaminated by radio frequency interference (RFI) still in your image. Those noisy channels also seem to dominate the color scale. The CHIME telescope for example sees the LTE cellular network between about 730 and 755 MHz very prominently (roughly your channels 52--56) and there was evidently also a significant source of RFI at the time of the observation around your channels 8--11. Can you try to mask all of those channels? In general, you get the best results (i.e., you save the most data) if you do the masking at the full 16,384-channel resolution, before averaging over channels. I hope that helps! Best, -- Ziggy

mef51 commented 4 years ago

Hi Ziggy, I've zapped a few more channels but this hasn't brought the burst forward. The data cutout spans 1 second, so I thought maybe I'm too zoomed out on the time axis to see it? But even zooming in on 128ms chunks of the data I can't find it, whether I look at the beams individually or a beam-averaged waterfall. The figure below is for the Aug. 14 burst but its the same problem with the other bursts. Ultimately I want to take autocorrelations of the bursts (like in Josephy et al. 2019 and Hessels et al. 2019), which will need a waterfall that has high enough SNR.. I think I'm out of ideas

Cheers, Moh

edit: i also have more noisy channels around 450MHz than shown in fig 1 of the paper, which is also where the burst lives

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import colors as mcolors
import os
import skimage.measure

folder = 'data/CHIME_FRB180814.J0422+73'
# burstdirs = ['burst180814', 'burst180911', 'burst180917', 'burst180919']
# burstdirs = ['burst180814', 'burst180911', 'burst180919']
burstdirs = ['burst180814']
timestep = 128

res = {    #(tres, fres)
    'burst180814': (0.983, 16384/400),
    'burst180911': (0.983, 16384/400),
    'burst180917': (1.966, 16384/400),
    'burst180919': (0.983, 16384/400),
}

for twindow in range(0, 7):
    for burst in burstdirs:
        burstfiles = [s for s in os.listdir(folder +'/'+ burst) if '.npy' in s and 'weights' not in s]
        weightfiles = [s for s in os.listdir(folder +'/'+ burst) if '.npy' in s and 'weights' in s]

        tl, tr = timestep*twindow, timestep*(twindow+1) - 1

        plt.figure(figsize=(20,24))
        plt.figure(figsize=(10,10))

        #plt.suptitle(burst)
        cmap = plt.get_cmap('gray_r')
        cmap.set_bad(color = 'w', alpha = 1.)

        dataAveraged = []
        for fname, wfname in zip(burstfiles, weightfiles):
            try:
                del data
            except Exception:
                pass
            data    = np.load("{}/{}/{}".format(folder, burst, fname))
            weights = np.load("{}/{}/{}".format(folder, burst, wfname))

            #tl, tr = 425, 575

            data    = data[:, :] 
            weights = weights[:, :]
            weights[13312:14592] = 0
            weights[11776:12032] = 0
            weights[2060:2850]   = 0
            weights[768:1024]    = 0

            np.putmask(data, ~weights.astype(bool), np.nan)
            data = data - 1*data[:, 0:200].mean(axis=1)[:,None]

            downsample = 256
            data = skimage.measure.block_reduce(data, block_size=(downsample, 1), func=np.nanmean)

            if dataAveraged == []: 
                dataAveraged = data
            else:
                dataAveraged += data

            plt.subplot(4, 4, burstfiles.index(fname)+1)
            plt.title(burst + ": beam" + fname.split('.')[0].split('beam')[1])
            plt.imshow(data[:,:], origin="lower", aspect="auto", interpolation="nearest", cmap=cmap, extent=(0, res[burst][0], 400, 800))
            plt.colorbar()
            plt.xlabel("Time [s]")
            plt.ylabel("Frequency [MHz]")

        # Beam averaged 
        # plt.title(burst + " beam averaged," + " time chunk {}:{}".format(tl, tr))
        # plt.imshow(dataAveraged[:,tl:tr]/len(burstfiles), origin="lower", aspect="auto", interpolation="nearest", cmap=cmap, extent=(0, res[burst][0], 400, 800))
        # plt.colorbar()

        plt.tight_layout()

bursts180814

zpleunis commented 4 years ago

Hi Mohammed,

Sorry for not picking up on this earlier, but unlike the data from "Periodic activity from a fast radio burst source" the bursts from this paper (stored in the .msgpack files) are not yet dedispersed (they are at DM=0 pc cm-3) and every .msgpack file is a separate chunk of 1024 time samples. To get the full chunk that encompasses the bursts you need to open all of the files for that burst at once, e.g.:

import glob
from cfod import chime_intensity as ci
fns = glob.glob("*.msgpack")
intensity, weights, fpga0s, fpgaNs, binning, rfi_mask, frame0_nanos = ci.unpack_datafiles(fns)

Then you need to dedisperse the intensity data to the DM of the burst/source (ideally at full frequency resolution before averaging frequency channels). Only in the dedispersed dynamic spectrum will the burst stand out.

Nice that you are planning to run an autocorrelation on the bursts. What information are you ultimately trying to get out of them, if I may ask?

Best, -- Ziggy

mef51 commented 4 years ago

Hi Ziggy, okay I see. So if I understand correctly the signal is dispersed over a 16 second interval? I was initially using unpack_datafiles will all the files together but I was getting the MemoryError below so I started going one by one. If I can't get around this memory error somehow with cfod then I'll try stacking them horizontally myself before dedispersing.

Traceback (most recent call last):
  File "numpyifier.py", line 15, in <module>
    intensity, weights, fpga0s, fpgaNs, binning, rfi_mask, frame0_nanos = ci.unpack_datafiles(fns)
  File "b:\dev\chime-frb-open-data\cfod\chime_intensity.py", line 138, in unpack_datafiles
    fn
  File "b:\dev\chime-frb-open-data\cfod\chime_intensity.py", line 81, in unpack_data
    intensity, weights = chunk.decode()
  File "b:\dev\chime-frb-open-data\cfod\assemble_chunk.py", line 102, in decode
    self.nt_per_packet, axis=1
MemoryError: Unable to allocate 64.0 MiB for an array with shape (16384, 1024) and data type float32

We were intrigued by the linear relationship between the burst drift rate and frequency that was shown in fig. 6 of Josephy et al. 2019 and have been exploring some ideas. Its easier to say something meaningful about the drift rates of bursts from repeater sources since the DM is better constrained, pulse trains are more common, etc..

Thanks, Moh

zpleunis commented 4 years ago

Hi Mohammed,

The DM delay from 800 to 400 MHz for DM=190 pc cm-3 (this source) is about 3.7 seconds, so my suspicion is that for all of the bursts from this source the full burst is in the last four msgpack files of the sets; so you could try reading in only the last four files for each burst. I hope that works!

I agree that the ~linear evolution of drift rate with frequency is very interesting! Feel free to email me if you want to discuss drift rates or drift rate measurements in more detail.

Best, -- Ziggy

mef51 commented 4 years ago

Thanks for your help! I'm still getting that memory error but it must be something weird with my system. And thanks for the offer, I will probably take you up on that :)

cheers, Mohammed