EoRImaging / eppsilon

eppsilon - error propagated power spectrum with interleaved observed noise
BSD 2-Clause "Simplified" License
5 stars 4 forks source link

healpix_rot #70

Closed nicholebarry closed 6 years ago

nicholebarry commented 6 years ago

There are some issues with healpix_rot. First, it's procedure is named healpix_rot_calc internally. Second, there is a stop on line 23 if some conditions are not met (should there be a warning message?).

bhazelton commented 6 years ago

Actually there are both a healpix_rot and a healpix_rot_calc in healpix_rot.pro. In IDL, any helper functions that are called by the main function (the one that matches the filename) has to come first in the file.

The stop on line 23 is interesting. @wenyang-li just ran into a situation where this happened and it sounds like you did too. I agree it should be fixed, but this is the first time in years that it has been hit and to fix it I need to understand what is going on. Wenyang put the data from a run where he saw it happen on enterprise for me, but I'm not getting to that stop statement, so I'm not sure what's going on yet.

nicholebarry commented 6 years ago

Ah yes, I see that too.

I'll investigate this issue further (since I'm hitting that stop). I should note that I am using the zenith observation of the golden set, but with a newly made uvfits. So maybe something about modern uvfits generation is causing problems...like maybe a uvfits parameter (phasing?) is not getting set correctly.

nicholebarry commented 6 years ago

Here is a log from a single dft:

folder_name = /fred/oz048/MWA//CODE/FHD/fhd_nb_nomajick/
obs_range = zenith
cube_type = weights
pol = xx
evenodd = odd
getvar_savefile: file /fred/oz048/MWA/CODE/FHD/fhd_nb_majick/Healpix/Combined_obs_zenith_even_cubeXX.sav not found 

I had moved the directory from fhd_nb_majick to fhd_nb_nomajick, but it appears as though somehow eppsilon knew what the filepath used to be and used that to look for a cube. I renamed the directory back to the original name, and everything worked.

Let's try to change this to reflect what the new file path should be, given all of the potential moving around that is required of cluster management.

@wenyang-li is this potentially your problem as well?

wenyang-li commented 6 years ago

I did not have this problem, but there is also a path issue. These are my error message:

% Compiled module: INIT_HEALPIX. % Compiled module: DEFINED. % LOADCT: Loading table Rainbow + white % Compiled module: SLURM_PS_JOB. % Compiled module: PS_WRAPPER. folder_name = /users/wl42/data/wl42/FHD_out/fhd_Deep_analysis_Phase2/ obs_range = 1161524992 datafile = /users/wl42/data/wl42/FHD_out/fhd_Deep_analysis_Phase2/ps/1161524992_cubeXX__even_odd_joint_info.idlsave file setup time: 0.020714998 n_vis difference between even & odd cubes: 0 n_vis % difference between even & odd cubes: 0 n_obs: 1 1 getvar_savefile: file fhd_Deep_analysis_Phase2/Healpix/1161524992_even_cubeXX.sav not found % Stop encountered: HEALPIX_ROT_CALC 23 /users/wl42/IDL/eppsilon/ps_utils/hea lpix_rot.pro

Now I realize why I got this problem. The job failed for some reason the first time, and the file 1161524992_cubeXXeven_odd_joint_info.idlsave already sits in the directory ps. The next time I run this job, the code then finds 1161524992_cubeXXeven_odd_joint_info.idlsave first, instead of looking for even_odd cubes under Healpix directory. After that, the healpix_rot.pro thought the even_odd_joint_info.idlsave file was the data, which causes this issue. I deleted 1161524992_cubeXX__even_odd_joint_info.idlsave this morning, then submit the job again, the issue goes away.

bhazelton commented 6 years ago

Ah ha! This is very helpful. Eppsilon does construct an info structure that knows about the names and locations of the data files and also contains a lot of metadata pulled from various files. Getting all that information takes time, so it doesn't redo it if the info file already exists unless you set /refresh_info. So if you change the file names or locations you should definitely set that keyword.

I will try to reproduce the problem where it gets to healpix_rot inappropriately -- it should have errored with a useful error message long before then.

bhazelton commented 6 years ago

To clarify @wenyang-li, the code sets the data_file = the info file as a short cut, but then it tests to see if it is an info file (in which case it gets the data file name out of the info structure) or if it is a data file. So it did not try to run healpix_rot on the info file, but it might have tried to do something equally silly. I will try to reproduce the issue.

bhazelton commented 6 years ago

Ok, I think I understand what was going on. I've added some checking to fail fast with a useful error message if it can't find the data (rather than going into healpix_rot with no data and failing there) in PR #71. Please take a look and see what you think!

bhazelton commented 6 years ago

fixed in PR #71