dlakaplan / LogParse

0 stars 0 forks source link

GBT file name convention (VEGAS) #25

Open swiggumj opened 4 years ago

swiggumj commented 4 years ago

Note the following convention holds for current/future files written with the VEGAS backend. We may need a slightly different prescription for older sessions using VEGAS+GUPPI/GUPPI alone.

VEGAS file naming convention is as follows:

vegas_[mjd]_[seconds after UTC midnight]_[source]_[scan #]_[file #].fits

...and cal files have cal between the scan/file numbers (see examples below).

vegas_58987_23250_J1713+0747_0001_cal_0001.fits 
vegas_58987_23374_J1713+0747_0002_0001.fits
dlakaplan commented 4 years ago

Why is this UTC if the other times are local?

dlakaplan commented 4 years ago

Where can we find the file #?

dlakaplan commented 4 years ago

Need to make sure we get the exact start time correct, in order for this to be accurate

dlakaplan commented 4 years ago

What about flux cal?

swiggumj commented 4 years ago

1/3. UTC rather than local, because this is the file-naming convention...? Not sure I have a great answer for that. It looks to me like the exact start time is in the log filename (right?), but as you mentioned on Slack, there can be multiple logs in a single file, so I'm not sure how to handle that.

  1. For AO logs, we're just producing base names, right? I think we can do that here too. Theoretically, we should be able to calculate the number of files written based on puppi/guppi/vegas settings and the scan duration, but I can try to get that implemented later.

  2. Fluxcals are currently distinguished by filename (e.g. fcal-VEGAS_820). It would probably be better to get this information from the log itself, however.

swiggumj commented 4 years ago

@ryanslynch, it looks like information contained in the logs themselves changed since you set up the cron job. Do you know why this happened? Different turtle query? I think @dlakaplan has the parser working with the new files, but it would be good to know what sort of info is available.

dlakaplan commented 4 years ago

for the time issue: is that the time of the log, or the time of the scan? I interpreted it as the scan start time but don't know

Basenames are fine - that's basically what I have so far

Do you have an example of a full fluxcal name? And there should be 2 (since it does on/off)?

swiggumj commented 4 years ago

Oh shoot, yes you're right -- the times needed are those of individual scans. I'll look into the fluxcal filename.

ryanslynch commented 4 years ago

Hi Guys,

I'll try to respond to a few things here:

Why seconds after UTC midnight instead of local? Because the MJD is also contained in the filenames and that is referenced to UTC. Why did we do this? To ensure that filenames are unique. This was not always the case with GUPPI if, for example, someone observed the same pulsars in the same order in the same MJD but in two different sessions, with the scan number being reset between sessions.

Is this file name structure different for VEGAS only vs VEGAS+GUPPI: No. The GUPPI file names are what they are and they VEGAS file names are what they are, and that doesn't change when you use them together.

Where can we find the file#? That starts with 0001 and increments during s can. There is a maximum size for individual files and when that is exceeded a new file is opened with an incremented file number that is totally continuous with the previous file. You should be able to glob on basenames that don't include the file number and get all of them. PUPPI/GUPPI do the same basic thing.

Fluxcal file: Any calibration scan (when observing a flux calibrator or the noise diode scans that are paired with each pulsar) have a file name structure of 'vegas[mjd][seconds after UTC midnight][source][scan#]cal[file#].fits'. Aside from the seconds after UTC midnight part, this is the same as GUPPI/PUPPI.

You are right that there should be at least two fluxcal observations, the on and off. There could be more than two if the scans were done multiple times due to a failure or some other issue. And one could do only the on, then stop, then resubmit, then do the on again + the off.

I didn't change the cronjob so I can't say why the logs would change. Can you be specific as to the changes?

The UTC seconds after midnight is at the start time of the scan, not of the log.

dlakaplan commented 4 years ago

The times in the log are in local, so I'm just making sure that this is intentionally UTC. I might have thought that it tried to keep all times in a single timezone, but that's fine if not (pytz works).

I still wonder about exactly which line in the log gives the start of the scan. There are several with a few ~s variations, so which is it? Can somebody post some file names along with a pointer to the relevant log?

dlakaplan commented 4 years ago

In the logs for a flux cal, I can't find the frequency. I can only find a rough mention of the receiver. Is this because it calibrates the whole Rx band? If you can point me to how this information is captured/stored, that would be helpful. This is looking at AGBT18B_226.fcal-VEGAS_820.2020-05-04_03:31:00+00:00.OPERATOR.log.txt

ryanslynch commented 4 years ago

The IF and backend setup should be identical to that of a polarization calibration scan on the pulsar (the "pulsar cal scan").  The rest frequency and bandwidth would be as specified in the Astrid configuration section.

I don't see the restfreq and bandwidth printed in the logs.  It seems like in some logs the requested configuration is shown so I'm not sure why this wouldn't be shown in the example you pointed me to.  It should be restfreq=1500 and bandwidth=800 MHz for L-Band and restfreq=820 MHz and bandwidth=200 MHz for PF800.

On 5/22/20 9:54 AM, David Kaplan wrote:

In the logs for a flux cal, I can't find the frequency. I can only find a rough mention of the receiver. Is this because it calibrates the whole Rx band? If you can point me to how this information is captured/stored, that would be helpful. This is looking at |AGBT18B_226.fcal-VEGAS_820.2020-05-04_03:31:00+00:00.OPERATOR.log.txt|

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dlakaplan/LogParse/issues/25#issuecomment-632701877, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOFZ4FQQQRETRDFDOLDKGLRSZ7YXANCNFSM4NEORHVQ.

-- Ryan Lynch

dlakaplan commented 4 years ago

In the latest logs the Astrid setup isn’t printed at all - that’s part of what Joe meant when he said that the format has changed.

On May 22, 2020, at 12:37 PM, ryanslynch notifications@github.com wrote:

The IF and backend setup should be identical to that of a polarization calibration scan on the pulsar (the "pulsar cal scan"). The rest frequency and bandwidth would be as specified in the Astrid configuration section.

I don't see the restfreq and bandwidth printed in the logs. It seems like in some logs the requested configuration is shown so I'm not sure why this wouldn't be shown in the example you pointed me to. It should be restfreq=1500 and bandwidth=800 MHz for L-Band and restfreq=820 MHz and bandwidth=200 MHz for PF800.

On 5/22/20 9:54 AM, David Kaplan wrote:

In the logs for a flux cal, I can't find the frequency. I can only find a rough mention of the receiver. Is this because it calibrates the whole Rx band? If you can point me to how this information is captured/stored, that would be helpful. This is looking at |AGBT18B_226.fcal-VEGAS_820.2020-05-04_03:31:00+00:00.OPERATOR.log.txt|

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dlakaplan/LogParse/issues/25#issuecomment-632701877, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOFZ4FQQQRETRDFDOLDKGLRSZ7YXANCNFSM4NEORHVQ.

-- Ryan Lynch

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

ryanslynch commented 4 years ago

I will investigate.

On 5/22/20 1:38 PM, David Kaplan wrote:

In the latest logs the Astrid setup isn’t printed at all - that’s part of what Joe meant when he said that the format has changed.

On May 22, 2020, at 12:37 PM, ryanslynch notifications@github.com wrote:

The IF and backend setup should be identical to that of a polarization calibration scan on the pulsar (the "pulsar cal scan"). The rest frequency and bandwidth would be as specified in the Astrid configuration section.

I don't see the restfreq and bandwidth printed in the logs. It seems like in some logs the requested configuration is shown so I'm not sure why this wouldn't be shown in the example you pointed me to. It should be restfreq=1500 and bandwidth=800 MHz for L-Band and restfreq=820 MHz and bandwidth=200 MHz for PF800.

On 5/22/20 9:54 AM, David Kaplan wrote:

In the logs for a flux cal, I can't find the frequency. I can only find a rough mention of the receiver. Is this because it calibrates the whole Rx band? If you can point me to how this information is captured/stored, that would be helpful. This is looking at

|AGBT18B_226.fcal-VEGAS_820.2020-05-04_03:31:00+00:00.OPERATOR.log.txt|

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub

https://github.com/dlakaplan/LogParse/issues/25#issuecomment-632701877, or unsubscribe

https://github.com/notifications/unsubscribe-auth/AAOFZ4FQQQRETRDFDOLDKGLRSZ7YXANCNFSM4NEORHVQ.

-- Ryan Lynch

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dlakaplan/LogParse/issues/25#issuecomment-632823416, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOFZ4BZAZNJCUHKRT7FBILRS22DBANCNFSM4NEORHVQ.

-- Ryan Lynch

ryanslynch commented 4 years ago

So I'm not sure why it seems to show up and sometimes it doesn't, but I found an option where I can save both the Astrid log and the script that was executed.  The script would have the same filename as the log except for the last bit: .log.txt would be .script.txt

Would you like me to regenerate things going back to, say Jan 01 2020?  Should we back up what is currently in /lustre/pulsar/projects/NANOGrav/logs?

Ryan

On 5/22/20 1:38 PM, David Kaplan wrote:

In the latest logs the Astrid setup isn’t printed at all - that’s part of what Joe meant when he said that the format has changed.

On May 22, 2020, at 12:37 PM, ryanslynch notifications@github.com wrote:

The IF and backend setup should be identical to that of a polarization calibration scan on the pulsar (the "pulsar cal scan"). The rest frequency and bandwidth would be as specified in the Astrid configuration section.

I don't see the restfreq and bandwidth printed in the logs. It seems like in some logs the requested configuration is shown so I'm not sure why this wouldn't be shown in the example you pointed me to. It should be restfreq=1500 and bandwidth=800 MHz for L-Band and restfreq=820 MHz and bandwidth=200 MHz for PF800.

On 5/22/20 9:54 AM, David Kaplan wrote:

In the logs for a flux cal, I can't find the frequency. I can only find a rough mention of the receiver. Is this because it calibrates the whole Rx band? If you can point me to how this information is captured/stored, that would be helpful. This is looking at

|AGBT18B_226.fcal-VEGAS_820.2020-05-04_03:31:00+00:00.OPERATOR.log.txt|

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub

https://github.com/dlakaplan/LogParse/issues/25#issuecomment-632701877, or unsubscribe

https://github.com/notifications/unsubscribe-auth/AAOFZ4FQQQRETRDFDOLDKGLRSZ7YXANCNFSM4NEORHVQ.

-- Ryan Lynch

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dlakaplan/LogParse/issues/25#issuecomment-632823416, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOFZ4BZAZNJCUHKRT7FBILRS22DBANCNFSM4NEORHVQ.

-- Ryan Lynch