RadioAstronomySoftwareGroup / pyuvsim

A ultra-high precision package for simulating radio interferometers in python on compute clusters.
https://pyuvsim.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
43 stars 7 forks source link

Consider replacing utils.strip_extension with os.path.splitext #381

Closed bhazelton closed 6 months ago

bhazelton commented 2 years ago

From conversation on PR #379:

@steven-murray steven-murray this function seems brittle -- thought of using the pathlib.Path module for this kind of stuff?

@bhazelton bhazelton Looks like this is only called in this module in the check_file_exists_and_increment function. My first reaction is I agree and I'd favor using this instead: https://docs.python.org/3/library/os.path.html#os.path.splitext

But then I started worrying about what HERA does where it adds lots of "extensions" to filenames and I wonder if that was the concern.

@aelanman can you comment?

aelanman commented 2 years ago

The strip_extension function, if I recall correctly, is written that way to support MIRIAD files, since those appear as directories to the OS and often don't have extensions. If os.path.splitext can handle that, then we should certainly use that instead.

I'm not aware of any standard function equivalent to check_file_exists_and_increment, but if there is one that can be replaced too.

bhazelton commented 2 years ago

So os.path.splitext can handle no extension. The thing I'm worried about is that it looks like this function only splits out extensions if they are passed in or in a known list. I think some PAPER miriad files have lots of periods in their name which are not really viewed as "extensions" by people who use the files. But I think they would be split out by os.path.splitext. Of course we can add handling for that, but we should be clear about what behavior we want. I personally am not sure exactly what is desired here.

aelanman commented 2 years ago

Having just tested it, I believe that if you give splitext a string with a bunch of periods in it, it will take the text after the last period as the extension:

In [2]: os.path.splitext("/path/to/file.uv.xyz.hera")
Out[2]: ('/path/to/file.uv.xyz', '.hera')
aelanman commented 2 years ago

Since this function is only used to come up with the output file name, such that existing files are not overwritten, those odd cases shouldn't be much concern. We're not trying to unambiguously read any visibility data file, just come up with an appropriate filename for writing data out.