Closed bsmithgall closed 8 years ago
Hmm interesting question, and I think you are probably right. I think there are probably two different functionalities going on here:
These can either be both handled in the Extractor
parent class contract, or they can be implemented separately and then individual classes can be cobbled together through multiple inheritance/composition.
I think that you could do something like this:
class Connector(object):
# a new base class that to deal with connections
pass
class FTPConnector(Connector):
# code that does connecting here
class CsvFtpExtractor(CsvExtractor, FTPConnector):
pass
# then you use the CsvFtpExtractor class in the pipeline
The other way would be include some sort of method that requires building a connection. I'm not totally sure what the interface would look like for that, but it would likely involve rewriting the extract
method on the FileExtractor
.
I do like the idea of having a separating these into Connectors and Extractors.
What I was thinking - and I believe it's the same as what you mention in your last paragraph - is having all Extractors pull data from a Connector (even if it's as simple as opening a file on the same disk). As far as I can imagine right now, the only part of the contract between the two would be Connectors would have to all have a connect()
method that returns whatever would be each Extractors target
. In that case, would it be important that connect()
only return one type for all sub classes of Connector?
What do you think?
That sounds like a smart approach.
:question: Hey @bsmithgall, What's the plan for our current use cases where we'll be extracting CSVs via SFTP? Will we end up creating an extractor that inherits both classes? To me it seems like the SFTP extractor would be higher up the chain than the CSV one as it, like the current file extractor, simply copies the opens the file stream, but I could be totally wrong. Here's the commit where I started the class if you'd like to take a look. 45c0fbc22387d05884e518c80c1de7b856ccfe37