OCHA-DAP / ocha-anticipy

Python package to support the development of anticipatory action frameworks
https://github.com/OCHA-DAP/ocha-anticipy
GNU General Public License v3.0
8 stars 1 forks source link

External config file as input #66

Closed turnerm closed 2 years ago

turnerm commented 2 years ago

Users will likely want to use their own config files for unsupported countries, this should be made possible to do.

caldwellst commented 2 years ago

Was looking into this, I think we should clarify exactly what we mean there. Have been thinking about this and not sure what exactly this should look like.

What is intended to go into the config file?

What all is supposed to go into the config file? Just any information that is country specific for various data sources that we need to record? At the moment, all it is is just the codab details and the iso3 code for generating the pathing. What else would be in there?

What is the benefit of CodAB in the config file?

When I was looking at the IRI implementation, it made me wonder what the benefit of having CodAB in the config file is. The only use of the config file is in setting the file pathing. I thought maybe the IRI implementation would, for instance, automatically create the BoundingBox from the config shape if it wasn't provided.

I see the reason it doesn't do that, but given those interlinkages are not available, should we strip down config to just be about file pathing? If we don't, I fear that what is going to be needed in config files just could bloat. But if we strip it down, we require that each specific module that might have a use for parameters passed through a config file instead makes them explicit within the module. Then users just have to understand the file structure.

So for CodAB, when instantiating, we can still use the pre-set configurations we have, but just selecting from those. And if it isn't available for an iso3 passed in, force the user to explicitly pass dataset, layer name, and max layers.

Does that make sense?


I see why we might want to use the config, I'm just thinking if I just want to use it for downloading something, it will be easier to explicitly pass in what I want than using a config file. If I do need to use a config file, I would only want to pass in just what I need for that indicator. So how do we handle missing values? Is that a problem?

The reason I got thinking about this is because working on #63, I think it would be much cleaner if we minimized what user input needed to be there. My thinking was we should minimize to just using the ISO3 as in the config we have now, and convert that to ISO2 and FEWSNET regions behind the scenes, otherwise it would be something we add to the config file.

turnerm commented 2 years ago

Thanks so much for looking into this Seth. Indeed thinking it through a bit more is part of the issue I guess, aha.

You might be interested in checking out this PR. Basically I had originally structured things so that the config and data sources were completely separate, then tied together with a Pipeline class, that essentially served as a composition root. So if I'm understanding what you've written here correctly, I believe that this previous structure aligns more closely with what you're imagining.

After working with it for awhile and discussions with Alessio I decided to remove it, for the reasons mentioned in the issue above. It was truly a pain to work with, among other things. Happy to discuss with you a bit more. But let me answer your individual points below:

What all is supposed to go into the config file? Just any information that is country specific for various data sources that we need to record? At the moment, all it is is just the codab details and the iso3 code for generating the pathing. What else would be in there?

Some datasources don't require any country-specific configuraiton, but for example with GloFAS you need to provide the reporting point coordinates.

What is the benefit of CodAB in the config file?

I see the country configuration files as containing country-specific nuisance parameters so that the user doesn't need to go hunting for them.

When I was looking at the IRI implementation, it made me wonder what the benefit of having CodAB in the config file is. The only use of the config file is in setting the file pathing. I thought maybe the IRI implementation would, for instance, automatically create the BoundingBox from the config shape if it wasn't provided.

It's true that passing the country config to something like IRI is a bit superfluous since it doesn't use much other than the iso3

I see the reason it doesn't do that, but given those interlinkages are not available, should we strip down config to just be about file pathing? If we don't, I fear that what is going to be needed in config files just could bloat. But if we strip it down, we require that each specific module that might have a use for parameters passed through a config file instead makes them explicit within the module. Then users just have to understand the file structure.

There is a separate pathing config, pathconfig.py. The country ones aren't directly intended to be about pathing,

I see why we might want to use the config, I'm just thinking if I just want to use it for downloading something, it will be easier to explicitly pass in what I want than using a config file. If I do need to use a config file, I would only want to pass in just what I need for that indicator. So how do we handle missing values? Is that a problem?

For downloading COD AB this would be fairly simple, however, the GloFAS example is a bit more complicated because it would require quite a complex data structure to be passed.

The reason I got thinking about this is because working on https://github.com/OCHA-DAP/pa-aa-toolbox/issues/63, I think it would be much cleaner if we minimized what user input needed to be there. My thinking was we should minimize to just using the ISO3 as in the config we have now, and convert that to ISO2 and FEWSNET regions behind the scenes, otherwise it would be something we add to the config file.

I think what you're describing here is similar to the old Pipeline structure. While I'd be hesitant to back to that, It's definitely true that we should make it easier to go from COD AB to region of interest to pass to the data sources, perhaps with some convenience functions.

Lastly, let me add what I had in mind for the country config file: I just thought that since we only support a limited set of countries, we should make it easy for the user to be able to use their own, so that it's not a blocker for them. (Or to override any outdated information)

caldwellst commented 2 years ago

Hi Monica, that's very clear, thanks so much for the detailed explanation. It's all clear to me I think. So thinking through a bit more, what we would then need is:

  1. System for validating config files. I guess we would need a general validator and then apply that for each specific data source, depending on the config variables required? I'm assuming then just like what you've already done, where codab is at the top level, so we just validate each of those being present.
  2. Clear documentation of what is needed in a config file. I guess one landing page that just has the full config file example for one of the countries we offer?
turnerm commented 2 years ago

System for validating config files. I guess we would need a general validator and then apply that for each specific data source, depending on the config variables required? I'm assuming then just like what you've already done, where codab is at the top level, so we just validate each of those being present.

Yes, so I think pydantic should complain if anything is fishy about a config file. I haven't yet tested if the error messages are very user friendly though.

Clear documentation of what is needed in a config file. I guess one landing page that just has the full config file example for one of the countries we offer?

Yeah true. Maybe it can go in the docstring of country_config.py? If not then I can make an issue to create a separate page

turnerm commented 2 years ago

Closed by #97