deeplycloudy / lmatools

Python code for working with VHF Lightning Mapping Array data
BSD 2-Clause "Simplified" License
22 stars 23 forks source link

Added automated mask length determination in file #28

Closed vbalderdash closed 4 years ago

vbalderdash commented 4 years ago

Added a quick parser to automatically pull the mask length from the header of the LMA '.dat.gz' file for the flash sorting.

deeplycloudy commented 4 years ago

@vbalderdash thanks for fixing this long-standing oversight, which is a sneaky way things can go wrong for all of us. Do you think this also fixes #3?

Could you move this into LMADataset, instead? It already parses the data format line, and would solve the problem at the data reader instead of the flash-sort level. https://github.com/deeplycloudy/lmatools/blob/master/lmatools/io/LMAarrayFile.py#L229

deeplycloudy commented 4 years ago

And this might not fix #3, which may have to do with storing enough characters in the station mask in the HDF5 file. But this solves the problem of automated determination, which is the first step.

vbalderdash commented 4 years ago

Looks like the root of #3 is line 57 of LMA_h5_write, and it should simple to carry that value through to the right function call. I'll try to move the fix to LMAarrayFile and get that carried through, too.

deeplycloudy commented 4 years ago

For carrying values through, feel free to add a mask_length attribute to LMAdataFile.

vbalderdash commented 4 years ago

mask_length is now in the LMAdataFile and pulling the value during the unzipped file read. I moved the h5 column class to within the function with the mask length variable so that can be initialized with the variable string size, so it seems to be writing out what is expected! There was a hanging 'import gzip' in the gen_autorun file that needs to be removed

deeplycloudy commented 4 years ago

This looks good in terms of the internals, and is what I wish I had written initially!

Have you looked at whether the removal of the mask_length argument to LMADataFile causes upstream scripts (like flash_sort_and_grid) to break? lmatools.flashsort.autosort.autorun_sklearn.cluster definitely passes mask_length to LMADataFile, for instance.

vbalderdash commented 4 years ago

The mask_length parameter in lmatools.flashsort.autosort.autorun_sklearn.cluster should be irrelevant since it calls LMADataFile.read which now defines the mask_length, but it probably should be removed for cleanliness. The flash sorting is reasonable as far as I can tell (tested on s6 and s7 formatted files), although I have not verified anything downstream of gridding. I was selfishly focusing on the flashes and and flash extents.

deeplycloudy commented 4 years ago

OK, so you've run things end to end and there's no breakage. I was worried about breaking other users' code if they were to update lmatools.

If you want to remove the other mask_length stuff for cleanliness / saving us from future confusion that would be great, or I can merge this now and open an issue to remind us to come back to do the cleanup.

vbalderdash commented 4 years ago

Probably still worth testing on another machine first!

deeplycloudy commented 4 years ago

I pushed some changes to your branch that I think fix the issue you reported with an empty station mask value that caused flash sorting to crash.

The changes above also fix #29, I think.