Closed joshuarwood closed 5 days ago
Sounds good.
Related to the latest file version: The finder looks in the "current" directory on HEASARC, which always contains the current latest version of the file. Older file versions are moved to a "previous" directory.
yeah, correct. The finders will always pull the latest versions. The issue this user saw was the fact that once they had the latest version downloaded to their PC the modified example notebook would occasionally fail if they only changed the burst number since some bursts had v00 or v02 instead of the v01 from the example.
Ah, I see. I don't think we want to get rid of explicitly passing a filepath to a .open()
method. There may certainly be situations where there are multiple versions of a file in a directory for a good reason, and the user wants to analyze one of them, but not necessarily the latest version.
so in the notebook examples it will always be downloading the latest version because it uses the finder classes, but I do see your point when generally using glob.glob()[-1]. The alternative is my other suggestion for:
file = tte_finder._file_filter(tte_finder.files, 'tte', 'fit', dets='n0')
since that will always get the file name that was downloaded for the example.
hmm. Another solution might be to return the downloaded file list from functions like get_tte()
I double checked the gdt-core implementation for heasarc and it already returns the list of downloaded paths. This includes a join() with the destination directory. All we need is to update the finder functions with return self.get()
instead of ending at self.get()
@joshuarwood can you confirm that PR #48 solves these issues?
yes, I can confirm that PR #48 solves these issues. The GBM data finders now return the list of downloaded files when running the get methods. These can be directly passed to file open methods. Users no longer need to do a glob / manually specify the file name when opening files.
It looks like the jupyter notebook examples are still using TriggerFtp instead of the newer TriggerFinder class with HTTPS support because they were written a little before the update with newer finders. Should switch this to the newer class.
I also think we should also switch hard coded file names like this:
tte = GbmTte.open(data_path.joinpath('090131090/glg_tte_n9_bn090131090_v01.fit'))
over to statements that natively handle the different file name versions (vXX) like this:
file = tte_finder._file_filter(tte_finder.files, 'tte', 'fit', dets='n0')
or like this:
file = glob.glob('090131090/glg_tte_n9_bn090131090_v??.fit')[-1]
The reason for this change is I had a user reach out to ask about writing their own function to automatically determine the latest available file version. They had started from this example but realized there were errors with the file versioning when they updated the burst number to a different burst. This seems like it will be a fairly frequent occurrence given how natural it is to copy the example and update to a burst of your own choosing.
Any preferences on using _file_filter() versus glob()?
I think I prefer glob because _file_filter is technically an internal method to the finder classes. I can create a pull request with the changes once we decide on _file_filter() or glob().