MillionConcepts / pdr

[P]lanetary [D]ata [R]eader - A single function to read all Planetary Data System (PDS) data into Python
Other
60 stars 6 forks source link

`pdr.Data` name attributes, or at least 'filename', can't always be strings #17

Closed m-stclair closed 3 years ago

m-stclair commented 3 years ago

products can have multiple data files, and in some cases one of those data files also contains the attached label. these attributes probably need to be more flexibly typed and assigned; at present, especially because they're sometimes used to help dispatch, they make Data break or just ignore a bunch of stuff.

michaelaye commented 3 years ago

i'm not understanding the issue title well. What I do for PDS products is that every one has its own class and that class "knows" if the label is attached or not. I don't understand how this relates to what a filename needs to be? In any case, I think one needs to distinguish between several different but similar ( ;) ) entities:

m-stclair commented 3 years ago

agree with all that. the issue is not about individual filenames as such, but pdr.Data's awareness of local file paths (and perhaps remote URIs) for cases in which a single product consists of multiple data objects spread across more than one file. Probably it needs an attribute like 'files' -- a mapping from PDS3 label pointers or PDS4 file_areas to local file paths.

michaelaye commented 3 years ago

ah yes, yes, I have been working on a Product/PathManager concept for years! ;) This goes a bit further than needs to be done for the raw PDS products, but maybe we can adopt similar principles. I currently am doing something like this: Because the scientist always adds processed versions of a product, the attribute names of this processed versions get a name and path id somewhere, and henceforth the PathManager knows that the "calibrated" version of said PRODUCT_ID has the filename signature "XXX_calib.IMG" or something like that. I am right now working on a configurable way to add that to an instrument-based calibration file, so that all the user would need to do is a tuple of attribute name and path extension, and then the PathManager knows what to do..

On Mon, Aug 23, 2021 at 12:41 PM m-stclair @.***> wrote:

agree with all that. the issue is not about individual filenames as such, but pdr.Data's awareness of local file paths (and perhaps remote URIs) for cases in which a single product consists of multiple data objects spread across more than one file. Probably it needs an attribute like 'files' -- a mapping from label pointers to local file paths.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MillionConcepts/pdr/issues/17#issuecomment-904018498, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAARBDUQLO5PS72STUIUCVLT6KI67ANCNFSM5CU4Q6PQ .

m-stclair commented 3 years ago

yes, that's the exact same class of problem...the question is whether or not it is possible to feasible to infer filename "signatures" in some relatively consistent way from labels -- even in a way that only works 80% of the time and can be covered with defined special cases the other 20% of the time. let me know if you have any clever ideas?

m-stclair commented 3 years ago

this is now a subset of #21, closing.