bluesky / event-model

data model for event-based data collection and analysis
https://blueskyproject.io/event-model
BSD 3-Clause "New" or "Revised" License
13 stars 30 forks source link

Root slashes in resource data prevents complete file handling #181

Open dylanmcreynolds opened 4 years ago

dylanmcreynolds commented 4 years ago

This is a little bit of pilot error, but it was easy to do and took a long time to debug.

I'm writing an ingestor to import rsoxs scattering data into databroker. When I compose the resource, I accidentally placed a forward slash in front of my resource_path below:

        ccd_resource = run_bundle.compose_resource(
            spec='FITS',
            root=images_root,
            resource_path="/foo/bar.fits",
            resource_kwargs={})

This prevents the file from being found, but os.path.lib here ignores the first parameter if it sees the forward slash in the second parameter: https://github.com/bluesky/event-model/blob/cc4c340fafacfc6411a002b7d3f3b2dbae0922bd/event_model/__init__.py#L813

Should this be documented? Should there be check for a forward slash in resource_path and remove it? I dunno. But even if we don't do anything this issue might help someone in the future.

danielballan commented 4 years ago

I tripped over this as well at some point this year but failed to open an issue. Thanks for flagging this. I think this should be prevented in document validation, and fixed retroactively on old data via a migration.

dylanmcreynolds commented 4 years ago

I think this should be prevented in document validation, and fixed retroactively on old data via a migration.

Ah, yes, that's a better strategy.

dylanmcreynolds commented 4 years ago

Thinking more on this, I see your to you notion that get_handler not change the intent of what was put into the resource document, especially since success or failure depends on configuration in the future. On the other hand, here's a case that we know will never work.... a root was configured and found in a root_map, and a forward slash means it will never work.

How about a strongly-worded log message? That alone could have saved me a lot of time.

danielballan commented 4 years ago

In that case should we just raise a clear exception? The application can catch it and log it if it wants to, but at the library layer my impulse is to fail informatively rather than log and then fail cryptically.

dylanmcreynolds commented 4 years ago

I would totally agree with that but I'm a little nervous. Will that cause innocent queries on the run that don't need to access the image to fail just because someone screwed up the file mapping?

danielballan commented 4 years ago

Fair point. I guess the order of operations then should be:

  1. Add log message as you suggest.
  2. Make document-creation functions pickier to stop generated problematic Resources.
  3. Fix historical documents.
  4. Change that log message to an error message.

Sound right?

danielballan commented 4 years ago

Actually, I might have a slightly better idea. When we attempt the create the handler

https://github.com/bluesky/event-model/blob/91d0b1c1a23d1d6fe473fa3782fd7522a1785af9/event_model/__init__.py#L808-L831

if we find ourselves about to error out and die anyway, can we at that point check whether the original resource_path was absolute, and raise a specific error message?

That's a bit complex, but I don't like the situation where you get a non-specific exception raises and the useful information is logged earlier, somewhere else, so it maybe worth going to these lengths to avoid that.

dylanmcreynolds commented 4 years ago

I just deleted a whole reply because now I guess I understand error_to_raise better. So, in my situation, I raised an error_to_raise, and you're proposing to add text to that? Sure, seems right.

danielballan commented 4 years ago

Yeah, to be precise I might even add a custom exception type rather than adding text so it can be caught/handled specifically if need be---an AbsoluteResourcePath exception class or something.