NVIDIA / fsi-samples

A collection of open-source GPU accelerated Python tools and examples for quantitative analyst tasks and leverages RAPIDS AI project, Numba, cuDF, and Dask.
271 stars 115 forks source link

[REVIEW]update to latest version of RAPIDS 0.13 #81

Closed yidong72 closed 4 years ago

yidong72 commented 4 years ago

Improved the CSV file loading time from 89s to 3s. fixed a few API behaviors changes

GPUtester commented 4 years ago

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

yidong72 commented 4 years ago

checked the unit tests, flake8 checks. all the notebooks working fine except the customized nodes one, the Dask give some error.

yidong72 commented 4 years ago

The remaining two notebooks are fixed. Ready for review.

yidong72 commented 4 years ago

added a unit test to make sure the nodes compute the consistent results.

avolkov1 commented 4 years ago

All the updates look good.

I found one bug that was not due to any of these changes, but I'd like to get it fixed. The bug is how to setup inputs for nodes without ports (this is my fault, I introduced the bug when adding ports API). The order of inputs could be incorrect for non-port nodes that have multiple inputs. I discovered the bug while re-running the mortgage example.

file: "<>/gquant/dataframe_flow/_node_flow.py" method: __call__ Lines 696-697:

            inputs = [self.__make_copy(data_input)
                      for data_input in inputs_data.values()]

The above code is wrong but my initial solution was incorrect too.

There's a bug in my fix. I'm working on figuring it out. Sorry.

I'm testing again. The fix should be that the inputs should be setup only when self.load is not set Change to code below:

    def __call__(self, inputs_data):
        if self.load:
            if isinstance(self.load, bool):
                output_df = self.load_cache()
            else:
                output_df = self.load
        else:
            if self._using_ports():
                # nodes with ports take dictionary as inputs
                inputs = {iport: self.__make_copy(data_input)
                          for iport, data_input in inputs_data.items()}
            else:
                # nodes without ports take list as inputs
                inputs = [self.__make_copy(inputs_data[ient['to_port']])
                          for ient in self.inputs]

            . . . the rest of the code
yidong72 commented 4 years ago

did the change