WPRDC / wprdc-etl

MIT License
8 stars 3 forks source link

Build out FTP capabilities #17

Closed bsmithgall closed 8 years ago

saylorsd commented 8 years ago

:question: Hey @bsmithgall, What's the plan for our current use cases where we'll be extracting CSVs via SFTP? Will we end up creating an extractor that inherits both classes? To me it seems like the SFTP extractor would be higher up the chain than the CSV one as it, like the current file extractor, simply copies the opens the file stream, but I could be totally wrong. Here's the commit where I started the class if you'd like to take a look. 45c0fbc22387d05884e518c80c1de7b856ccfe37

bsmithgall commented 8 years ago

Hmm interesting question, and I think you are probably right. I think there are probably two different functionalities going on here:

  1. Functionality needed to access a particular file
  2. Functionality needed to read a particular file

These can either be both handled in the Extractor parent class contract, or they can be implemented separately and then individual classes can be cobbled together through multiple inheritance/composition.

I think that you could do something like this:

class Connector(object):
    # a new base class that to deal with connections
   pass

class FTPConnector(Connector):
    # code that does connecting here

class CsvFtpExtractor(CsvExtractor, FTPConnector):
    pass

# then you use the CsvFtpExtractor class in the pipeline

The other way would be include some sort of method that requires building a connection. I'm not totally sure what the interface would look like for that, but it would likely involve rewriting the extract method on the FileExtractor.

saylorsd commented 8 years ago

I do like the idea of having a separating these into Connectors and Extractors.

What I was thinking - and I believe it's the same as what you mention in your last paragraph - is having all Extractors pull data from a Connector (even if it's as simple as opening a file on the same disk). As far as I can imagine right now, the only part of the contract between the two would be Connectors would have to all have a connect() method that returns whatever would be each Extractors target. In that case, would it be important that connect() only return one type for all sub classes of Connector?

What do you think?

bsmithgall commented 8 years ago

That sounds like a smart approach.