USRA-STI / gdt-fermi

Gamma-ray Data Tools - Fermi mission components
Apache License 2.0
3 stars 7 forks source link

Notebook examples should use non-FTP finders and select file names using version handling #47

Closed joshuarwood closed 5 days ago

joshuarwood commented 2 months ago

It looks like the jupyter notebook examples are still using TriggerFtp instead of the newer TriggerFinder class with HTTPS support because they were written a little before the update with newer finders. Should switch this to the newer class.

I also think we should also switch hard coded file names like this:

tte = GbmTte.open(data_path.joinpath('090131090/glg_tte_n9_bn090131090_v01.fit'))

over to statements that natively handle the different file name versions (vXX) like this:

file = tte_finder._file_filter(tte_finder.files, 'tte', 'fit', dets='n0')

or like this:

file = glob.glob('090131090/glg_tte_n9_bn090131090_v??.fit')[-1]

The reason for this change is I had a user reach out to ask about writing their own function to automatically determine the latest available file version. They had started from this example but realized there were errors with the file versioning when they updated the burst number to a different burst. This seems like it will be a fairly frequent occurrence given how natural it is to copy the example and update to a burst of your own choosing.

Any preferences on using _file_filter() versus glob()?

I think I prefer glob because _file_filter is technically an internal method to the finder classes. I can create a pull request with the changes once we decide on _file_filter() or glob().

AdamGoldstein-USRA commented 2 months ago

Sounds good.

Related to the latest file version: The finder looks in the "current" directory on HEASARC, which always contains the current latest version of the file. Older file versions are moved to a "previous" directory.

joshuarwood commented 2 months ago

yeah, correct. The finders will always pull the latest versions. The issue this user saw was the fact that once they had the latest version downloaded to their PC the modified example notebook would occasionally fail if they only changed the burst number since some bursts had v00 or v02 instead of the v01 from the example.

AdamGoldstein-USRA commented 2 months ago

Ah, I see. I don't think we want to get rid of explicitly passing a filepath to a .open() method. There may certainly be situations where there are multiple versions of a file in a directory for a good reason, and the user wants to analyze one of them, but not necessarily the latest version.

joshuarwood commented 2 months ago

so in the notebook examples it will always be downloading the latest version because it uses the finder classes, but I do see your point when generally using glob.glob()[-1]. The alternative is my other suggestion for:

file = tte_finder._file_filter(tte_finder.files, 'tte', 'fit', dets='n0')

since that will always get the file name that was downloaded for the example.

joshuarwood commented 2 months ago

hmm. Another solution might be to return the downloaded file list from functions like get_tte()

joshuarwood commented 2 months ago

I double checked the gdt-core implementation for heasarc and it already returns the list of downloaded paths. This includes a join() with the destination directory. All we need is to update the finder functions with return self.get() instead of ending at self.get()

AdamGoldstein-USRA commented 6 days ago

@joshuarwood can you confirm that PR #48 solves these issues?

joshuarwood commented 5 days ago

yes, I can confirm that PR #48 solves these issues. The GBM data finders now return the list of downloaded files when running the get methods. These can be directly passed to file open methods. Users no longer need to do a glob / manually specify the file name when opening files.