TresAmigosSD / SMV

Spark Modularized View
Apache License 2.0
42 stars 22 forks source link

support path for getSchemaByDataFileAsSmvSchema #1586

Closed cloudysail closed 5 years ago

cloudysail commented 5 years ago

There are two places using self.inputDir() as below. Shall we pass a parameter path so that we don't always rely on the default self.inputDir()?

def getSchemaByDataFileAsSmvSchema(self, data_file_name):
        """Get the schema of a data file from its path and returns a Scala SmvSchema instance.
           The result will be None if the corresponding schema file does not exist or is invalid.
        """
        data_file_path = os.path.join(self.inputDir(), data_file_name)
        return self.j_smvPyClient.readSchemaFromDataPathAsSmvSchema(data_file_path)
    def getFileNamesByType(self, ftype):
        """Return a list of file names which has the postfix ftype
        """
        all_files = self._jvm.SmvPythonHelper.getDirList(self.inputDir())
        return [str(f) for f in all_files if f.endswith(ftype)]
ninjapapa commented 5 years ago

I do think both of those 2 api method should be deprecated. The first one may be later, but the second one I don't think we are using it any more.

cloudysail commented 5 years ago

Talked with @ninjapapa, for short term, we would improve the getSchemaByDataFileAsSmvSchema by supporting 'path' parameter.