Closed saylorsd closed 8 years ago
You have to add paramiko to the requirements. I would also add it to the setup.py
requirements.
I think you'll also have to modify the settings.example.json
to get the tests to pass on travis.
I'm going to write some additional unittests as well.
About the settings: I'd like to have the different connections live on the same level as the CKAN servers in the settings e.g:
{
"staging": {
"api_key": "your ckan api key",
"root_url": "example: https://stage.wprdc.org/",
"organizations": {
"Name": "ckan-generated hash"
},
"statusdb": ":memory:"
},
"county-ftp": {
"stuff": "things"
}
}
But pipeline.config
is from settings[server]
and I don't wan't to rework the config setup without asking your first. Would it make sense to refactor our .config
to something like .ckan_config
, and then have a general .config
for connection names, status databse names, and possible other things that may end up in there?
I feel like a pipeline's runtime should only be able to access one level of configuration, but perhaps that's not right. Maybe the right thing to do is to have the whole configuration object floating around and then different pieces of the pipeline can access different pieces of configuration.
At the very least, doing that sort of thing opens that file as a side effect whenever something from that package is imported, so it probably shouldn't happen.
Would it make sense to have a separate file for global settings that isn't written to by pipelines? Currently, the settings.json file we have provides CKAN-instance-specific information. Things like SFTP settings, RESTful API settings (in the future), etc wouldn't be specific to a pipeline or downstream CKAN instance, so I feel like there would be a lot of copying and pasting these chunks of info into each CKAN set-up.
I don't think the pipelines should be writing into the configuration. In my opinion for this case configuration should absolutely be read only, and if we need information about state it should be kept in place in some other place.
In either case, I think a larger configuration workaround might be in order. I think that:
Pipeline.run()
. That way you can safely import things without side effects.Ok. So if we were to have CKAN settings and global settings (in separate files or however), we could just read both as part of Pipeline.set_config_from_file
when it's instantiated without issue, right?
Yes, though again I think that all configuration that might be needed should be in one file, and if it isn't present, an InvalidConfigException
should be raised. So maybe the file looks like this:
{
"staging": {
"general": {
"statusdb": ":memory:"
},
"connector": {
"username": "username",
"password": "password"
},
"loader": {
"ckan_api_key": "my-api-key",
"ckan_root_url": "my-root-url",
"ckan_organizations": {
"Name": "some-hash"
}
}
}
}
If we do it that way, then each piece of the pipeline could get isolated configuration.
I think I follow, but what exactly do you mean by "...each piece of the pipeline could get isolated configuration."
You could also nest things n-levels deep and pass in the correct keypath on run.
So the Pipeline
would get instantiated with the whole block, and then you would pass a configuration path to the Loader.
For example:
Pipeline('name', 'Print Name', '/path/to/config.json', 'settingsString') \
.connect(FileConnector, config='sftp.county_sftp')
...
.load(CKANDatastoreLoader, config='ckan')
Then there would be some parse_config
method on pipeline that would take a dot-separated string and look up that piece of the overall configuration dictionary and pass it to the appropriate object. That way the loader won't have access to the connector config, etc.
I opened a new issue for this (#43) and will push up a sample implementation of what I'm talking about and you can let me know what you think
@bsmithgall Should I wait to merge this? Do you plan on implementing the config change in this branch?
New branch, but we might want to merge that branch into this branch so we can get the fatal_od running agian
Sounds good!
@bsmithgall while testing on the staging server I remembered we needed to change CKANLoader.__init__()
to check for a current instance of the resource name on CKAN and then use its resource id for the loading.
Please take a look, and then I'll merge.
+1 LGTM
Fixes #17
@bsmithgall Can you take a look particularly at what I did in
connector.checksum_contents()
?