alephdata / memorious

Lightweight web scraping toolkit for documents and structured data.
https://docs.alephdata.org/developers/memorious
MIT License
311 stars 59 forks source link

Proposal: Add possibility to use ENV vars in yaml config #178

Closed simonwoerpel closed 2 years ago

simonwoerpel commented 3 years ago

Hey,

i find it very useful to use ENV vars in yaml config, to use crawlers via the cli like:

config

...
params:
  startdate: !ENV ${STARTDATE}
...

and then execute:

STARTDATE=2021-05-27 memorious run my_crawler

Therefore the yaml parsing for the config file needs to be modified, I found a solution here: https://medium.com/swlh/python-yaml-configuration-with-environment-variables-parsing-77930f4273ac which I added into util.py

What do you think about this idea?

sunu commented 3 years ago

Hey @simonwoerpel! I am a bit unsure whether we want this. Obviously it is a useful bit of functionality. However, I think we can also parse the env vars in Python directly. Something like:

start_date =  context.params.get('start_date', os.environ.get('STARTDATE'))

It will still work with the same execution syntax as you mentioned.

For YAML only scrapers, this can be useful. But we have never needed something like this before. I am a bit reluctant to add complexity if there is no clear need yet. If you have any particular scenarios where this would be necessary, I would like to know.

Let me know what you think.

simonwoerpel commented 3 years ago

Hey @sunu thanks for your reply!

you are totally right, this is only useful for "yaml-only" scrapers. I will think about it if there is really a need within the next weeks (I am working on a bigger project based on memorious ;)) and if I don't come up with a real scenario where this is needed, we can close this.

Will let you know!

sunu commented 3 years ago

Sounds good, Simon! I would love to learn more about what you're working on too :)

simonwoerpel commented 2 years ago

i can confirm for now, i have no use case for this scenario

simonwoerpel commented 2 years ago

so i close this PR