ScottishCovidResponse / SCRCIssueTracking

Central issue tracking repository for all repos in the consortium
6 stars 0 forks source link

StandardAPI constructor fails on empty metadata.yaml #662

Open ianhinder opened 4 years ago

ianhinder commented 4 years ago

If you are starting with a new data repository, e.g. for testing, constructing a StandardAPI object fails because it cannot find metadata.yaml. If you create an empty metadata.yaml, it then fails with TypeError: 'NoneType' object is not iterable. If you add some dummy metadata, it works.

github-actions[bot] commented 4 years ago

Heads up @mrow84 @bobturneruk - the "data pipeline api" label was applied to this issue.

mrow84 commented 4 years ago

I would have expected this code, which was merged quite a while ago, to make it so you didn't need a metadata.yaml file. Are you using the most recent code?

ianhinder commented 4 years ago

I have that code in the version I'm using - I last merged from master on 17-Jul-2020 (c51ce6229c1551aa7c3ee47e19ae513f233181a3).

When I write a data product, I get the warning, but it succeeds. However, this is not enough, as if anything tries to read the data product back, it cannot find it as I get

KeyError                                  Traceback (most recent call last)
~/Documents/Projects/SCRC/src/BEEPmbp/data_pipeline_api/file_api.py in open_for_read(self, **call_metadata)
    187         try:
--> 188             filename = Path(read_metadata[MetadataKey.filename])
    189         except KeyError:

KeyError: 'filename'

which is not surprising since there is nothing stored on disk that would allow the file to be found.

mrow84 commented 4 years ago

Is this the same issue? It seems different.

On this (possibly new) issue: is this a read from a subsequent run? You cannot read data products that you have written in the same run.

ianhinder commented 4 years ago

Apologies, I was confusing this with #673. I confirm that the present issue is now fixed. It now says "could not find metadata.yaml" which could be confused with an error message and failure. Not sure what to do about this. If you configure the logging setup as per standard_api_usage.py, then the string "warning" is also printed, which would indicate to users that this is a warning. Maybe the warning could be more explicit that this is ok, e.g. "Could not find metadata.yaml; proceeding without", or "assuming empty", or something.

mrow84 commented 4 years ago

It is a pathological case from the perspective of intended usage, in that we always expect there to be a populated metadata.yaml, written by the download script. Users are free to instantiate the file API without one, and just use the config.yaml, but they are definitely doing something "non-standard", which I think warrants a warning.

The python logging library standardises warning to mean

An indication that something unexpected happened, or indicative of some problem in the near future (e.g. ‘disk space low’). The software is still working as expected.

which I think broadly fits what is going on, as opposed to say, info level, that being

Confirmation that things are working as expected.

I will, however, add a bit more text to the warning along the lines you suggested in an upcoming PR that refactors the file API a bit.

ianhinder commented 4 years ago

I agree that warning is correct - my comment was just that it's not clear (to everyone) that it is only a warning, at least from the jupyter notebook interface I was using, without having configured logging to emphasise this. If you enter a command, and get a message back saying something isn't found, it's natural to assume it is an error. Regarding it being a pathological case: isn't this also what would happen if the user pointed to a directory which was not a data cache, e.g. by mistake? Anyway, thanks for amending the message; that should be enough.

mrow84 commented 4 years ago

Yes, quite, I only really see this happening when something has happened outside of expected usage, and so the user should ensure that they know what they are doing.