Implement `load` method for Datapusher (also, rename Datapusher)

bsmithgall commented 8 years ago

What's the status on this? Want me to start working on it?

saylorsd commented 8 years ago

I have some stuff committed, if you want to take a look. I plan on doing more on it once I get back to the office.

bsmithgall commented 8 years ago

Which branch?

bsmithgall commented 8 years ago

Nvm found it.

bsmithgall commented 8 years ago

A quick comment -- you can replace all of the calls to super() with calls to self instead (for example, super(DataStorer, self).create_resource(self.package_id, self.resource_name) can become self.create_resource(self.package_id, self.resource_name)). super is a python keyword that lets you access the super class's implementation of a method if you are overriding it in a subclass -- so you would only need to call it if you had logic in the APILoader method.

I would also name that like CKANLoader because that's what it is doing. I'm also not totally sure what the difference is between the DataStorer and FileStorer objects.

bsmithgall commented 8 years ago

Like, when would you use one vs. the other?

Also, the python docs have a good explanation of super

saylorsd commented 8 years ago

I think I still have a decent chunk to add to the filestore, but it will end up being used to upload files in binary while datastore essentiallly makes insert queries. I'll post more when on it when I get back from class

bsmithgall commented 8 years ago

Ah ok -- that makes sense. Are there existing pipelines that use this in the existing scripts?

Is it ok if I do some work on the other one?

saylorsd commented 8 years ago

I think there is one ATM, but there definitely soon will be. And yeah, go ahead, though I will post some concerns about what/how to implement these loaders when I get back. Mainly differentiating between creating and updating.

bsmithgall commented 8 years ago

Somehow the history on the iss15 branch got hopefully screwed up, so I'm going to take the modifications to the loaders.py file and create a new branch. Are there other major changes that I should know about before doing that?

saylorsd commented 8 years ago

I don't think so. I think all relevant changes should just be in that file.

bsmithgall commented 8 years ago

:+1:

bsmithgall commented 8 years ago

Take a look at 56f1137 and let me know what you think. I still need to handle the rest of the Binary loader implementation but I figured that you could do that because I'm not sure what to do about it.

Let me know what you think.

saylorsd commented 8 years ago

Now that I have a keyboard and don't have to pay attention to a lecture, here's a quick rundown on the difference I envisioned:

DataStorer()

for tabular data -- which should be the case for most pipelines.
You can get more info here. It pretty much backed by a postgres database and the calls made through the API are just an interface to that.

FileStorer(),

will be used to load files that won't work with the DataStorer (pdf, txt, doc, etc). These get stored in a directory on the server as files.

saylorsd commented 8 years ago

I like it; it's much more concise. I might change some of the naming conventions around to keep in line with some CKAN terms. I'll put something together for the binary loader and let you know when I push it.

bsmithgall commented 8 years ago

:+1: great. I enumerated a few TODOs, but it might be better to split the BinaryLoader into a separate ticket so that we can get to testing with a real pipeline.

saylorsd commented 8 years ago

Sounds good. I got a little work done on it, but in doing so, noticed that there might be a lot of things we'll need to consider throughout the whole pipeline related to it. So for this reason as well, I think a separate ticket would be helpful.

Do you want to make a PR for your current commit, close this ticket, and then open one for the BinaryLoader?

bsmithgall commented 8 years ago

:+1: :+1: good discussion

WPRDC / wprdc-etl

Implement `load` method for Datapusher (also, rename Datapusher) #15

DataStorer()

FileStorer(),