NREL / nsrdb

NSRDB data processing pipeline. Includes satellite data assimilation, cloud property prediction and gap-filling, radiative transport modeling, and data collection.
https://nrel.github.io/nsrdb/
BSD 3-Clause "New" or "Revised" License
7 stars 0 forks source link

convert lambda handler to class #30

Closed MRossol closed 3 years ago

grantbuster commented 3 years ago

Looks great, my only comment from a light review would be that run_nsrdb() should not be a method since it only calls a method with no additional logic and it's only referenced in one place.

MRossol commented 3 years ago

So my logic with the run_nsrdb() method was that since we need the data_model object for the dump_to_h5() method we have to run the NSRDB somewhere. The options that I can see are: 1) in its own method, i.e. run_nsrdb(self) 2) in a normal run(self) method that would look like this:

 try:
        data_model = NSRDB.run_full(self.day, self.grid, self.freq,
                                    var_meta=self.var_meta,
                                    factory_kwargs=self.factory_kwargs,
                                    fill_all=self.get('fill_all', False),
                                    low_mem=self.get('low_mem', False),
                                    max_workers=self.get('max_workers', 1),
                                    log_level=None)
        self.dump_to_h5(data_model)
 except Exception:
        self.logger.exception('Failed to run NSRDB!')
        raise

3) Or in the run class method which we would have to change to look like this:

        nsrdb = cls(event)

        nsrdb.logger.debug(f'event: {event}')
        nsrdb.logger.debug(f'context: {context}')
        var_meta = nsrdb['var_meta']
        nsrdb.logger.debug(f'NSRDB inputs:'
                           f'\nday = {nsrdb.day}'
                           f'\ngrid = {nsrdb.grid}'
                           f'\nfreq = {nsrdb.freq}'
                           f'\nvar_meta = {var_meta}'
                           f'\nfactory_kwargs = {nsrdb.factory_kwargs}')

        try:
            data_model = NSRDB.run_full(nsrdb.day, nsrdb.grid, nsrdb.freq,
                                    var_meta=nsrdb.var_meta,
                                    factory_kwargs=nsrdb.factory_kwargs,
                                    fill_all=nsrdb.get('fill_all', False),
                                    low_mem=nsrdb.get('low_mem', False),
                                    max_workers=nsrdb.get('max_workers', 1),
                                    log_level=None)
            nsrdb.dump_to_h5(data_model)
        except Exception:
            nsrdb.logger.exception('Failed to run NSRDB!')
            raise

        success = {'statusCode': 200,
                   'body': json.dumps('NSRDB ran successfully and '
                                      f'create/updated {nsrdb.fpath_out}')}

        return success

I find 2 and 3 to be less clean than 1... Am I missing a more elegant 4th option?

MRossol commented 3 years ago

FYI, lambda can't find a class so had to add back in the handler function and just have it run Handler.run, stupid i know...

grantbuster commented 3 years ago

I like 3 from your list above. There's no point in writing a method that only calls a single other method without any additional logic. Your proposed code for 3 looks great to me.

MRossol commented 3 years ago

For this usec ase, but what if you ever want to run NSRDB without dumping it to disc? You'd have to move that bit of code or re write it outside the class method...

What about a dynamic data_model property?

if self._data_model is None:
    self._data_model = NSRDB.run_full(self.day, self.grid, self.freq,
                                    var_meta=self.var_meta,
                                    factory_kwargs=self.factory_kwargs,
                                    fill_all=self.get('fill_all', False),
                                    low_mem=self.get('low_mem', False),
                                    max_workers=self.get('max_workers', 1),
                                    log_level=None)

return self._data_model
grantbuster commented 3 years ago

Then you would just call the NSRDB.run_full() method again. ONE method call is not enough repeated code to package into a new function. What if the next time were calling NSRDB.run_full() we want to pass in new kwargs? Would you add new kwargs to the wrapper function? No! That's what the NSRDB.run_full() is for! Same problem with a property attribute.

MRossol commented 3 years ago

I disagree, the whole point of the class is to extract/modify/load, the needed args/kwargs to execute NSRDB.run_full. Seems silly to store all of those attributes inside the class and then pass them into NSRDB.run_full outside the class.

I like the property, it lets you run NSRDB.run_full with the args/kwargs input into the class, but if you want to run with different args you can run NSRDB.run_full and input the args/kwargs explicitly either from class attributes or elsewhere

grantbuster commented 3 years ago

okay, i understand your motivation and i think the property is okay.

MRossol commented 3 years ago

Cool, waiting on Jay to confirm it works, then I'll merge into main