Open ianhinder opened 4 years ago
Heads up @mrow84 @bobturneruk - the "data pipeline api" label was applied to this issue.
I would have expected this code, which was merged quite a while ago, to make it so you didn't need a metadata.yaml file. Are you using the most recent code?
I have that code in the version I'm using - I last merged from master on 17-Jul-2020 (c51ce6229c1551aa7c3ee47e19ae513f233181a3).
When I write a data product, I get the warning, but it succeeds. However, this is not enough, as if anything tries to read the data product back, it cannot find it as I get
KeyError Traceback (most recent call last)
~/Documents/Projects/SCRC/src/BEEPmbp/data_pipeline_api/file_api.py in open_for_read(self, **call_metadata)
187 try:
--> 188 filename = Path(read_metadata[MetadataKey.filename])
189 except KeyError:
KeyError: 'filename'
which is not surprising since there is nothing stored on disk that would allow the file to be found.
Is this the same issue? It seems different.
On this (possibly new) issue: is this a read from a subsequent run? You cannot read data products that you have written in the same run.
Apologies, I was confusing this with #673. I confirm that the present issue is now fixed. It now says "could not find metadata.yaml" which could be confused with an error message and failure. Not sure what to do about this. If you configure the logging setup as per standard_api_usage.py, then the string "warning" is also printed, which would indicate to users that this is a warning. Maybe the warning could be more explicit that this is ok, e.g. "Could not find metadata.yaml; proceeding without", or "assuming empty", or something.
It is a pathological case from the perspective of intended usage, in that we always expect there to be a populated metadata.yaml
, written by the download script. Users are free to instantiate the file API without one, and just use the config.yaml
, but they are definitely doing something "non-standard", which I think warrants a warning.
The python logging library standardises warning to mean
An indication that something unexpected happened, or indicative of some problem in the near future (e.g. ‘disk space low’). The software is still working as expected.
which I think broadly fits what is going on, as opposed to say, info level, that being
Confirmation that things are working as expected.
I will, however, add a bit more text to the warning along the lines you suggested in an upcoming PR that refactors the file API a bit.
I agree that warning is correct - my comment was just that it's not clear (to everyone) that it is only a warning, at least from the jupyter notebook interface I was using, without having configured logging to emphasise this. If you enter a command, and get a message back saying something isn't found, it's natural to assume it is an error. Regarding it being a pathological case: isn't this also what would happen if the user pointed to a directory which was not a data cache, e.g. by mistake? Anyway, thanks for amending the message; that should be enough.
Yes, quite, I only really see this happening when something has happened outside of expected usage, and so the user should ensure that they know what they are doing.
If you are starting with a new data repository, e.g. for testing, constructing a StandardAPI object fails because it cannot find metadata.yaml. If you create an empty metadata.yaml, it then fails with
TypeError: 'NoneType' object is not iterable
. If you add some dummy metadata, it works.