NVIDIA / fsi-samples

A collection of open-source GPU accelerated Python tools and examples for quantitative analyst tasks and leverages RAPIDS AI project, Numba, cuDF, and Dask.
270 stars 114 forks source link

[BUG]gQuant/plugins/gquant_plugin/notebooks/cuIndicator/indicator_demo.ipynb does not work #155

Open complyue opened 2 years ago

complyue commented 2 years ago

Describe the bug

The cuIndicator demo notebook has various issues to reproduce its result.

Steps/Code to reproduce bug

First, an identified issue and possible fix:

https://github.com/NVIDIA/fsi-samples/issues/154#issuecomment-1016278338

Then

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [16], in <module>
     17     return df.query('datetime<@end_date and datetime>=@beg_date')
     19 indicator_lists = ['Accumulation Distribution', 'ADMI', 'Average True Range', 'Bollinger Bands',
     20                    'Chaikin Oscillator', 'Commodity Channel Index', 'Coppock Curve', 'Donchian Channel',
     21                    'Ease of Movement', 'EWA', 'Force Index', 'Keltner Channel', 'KST Oscillator', 'MA', 'MACD',
     22                    'Mass Index', 'Momentum', 'Money Flow Index', 'On Balance Volume', 'Parabolic SAR',
     23                    'Rate of Change', 'RSI', 'Stochastic Oscillator D', 'Stochastic Oscillator K', 'TRIX',
     24                    'True Strength Index', 'Ultimate Oscillator', 'Vortex Indicator',]
---> 26 task_stocks_list = [task_stock_symbol]
     27 task_stocks_graph = TaskGraph(task_stocks_list)
     28 list_stocks = task_stocks_graph.run(outputs=['stock_symbol.stock_name'])[0].to_pandas().set_index('asset_name').to_dict()['asset']

NameError: name 'task_stock_symbol' is not defined

(I tried to give some value to that variable but further strange errors occurred, so maybe someone familiar with it should better have a look)

Expected behavior

The notebook should be reproducible.

Environment overview (please complete the following information)

Environment details

N/A

Additional context

154

avolkov1 commented 2 years ago

A lucky guess took me a step forward w.r.t. indicator_demo.ipynb, if I change:

task_load_csv_data = {
    TaskSpecSchema.task_id: "load_csv_data",
    TaskSpecSchema.node_type: "CsvStockLoader",
    TaskSpecSchema.conf: {"file": "../data/stock_price_hist.csv.gz"},
    TaskSpecSchema.inputs: {}
}

To:

task_load_csv_data = {
    TaskSpecSchema.task_id: "load_csv_data",
    TaskSpecSchema.node_type: "CsvStockLoader",
    TaskSpecSchema.conf: {"file": "../data/stock_price_hist.csv.gz"},
    TaskSpecSchema.inputs: {},
    TaskSpecSchema.module: 'greenflow_gquant_plugin.dataloader',
}

Then it'll fail with Exception: Cannot find the Node Class:SortNode instead of Exception: Cannot find the Node Class:CsvStockLoader.

So is this the way codeful graph nodes are supposed to be written? Should I report a bug against indicator_demo.ipynb and fix it somehow?

So in general, one should set the module. So it should be like this:

task_load_csv_data = {
    TaskSpecSchema.task_id: "load_csv_data",
    TaskSpecSchema.node_type: "CsvStockLoader",
    TaskSpecSchema.conf: {"file": "../data/stock_price_hist.csv.gz"},
    TaskSpecSchema.inputs: {},
    TaskSpecSchema.module: 'greenflow_gquant_plugin.dataloader'
}

task_sort = {
    TaskSpecSchema.task_id: "sort",
    TaskSpecSchema.node_type: "SortNode",
    TaskSpecSchema.conf: {"keys": ['asset', 'datetime']},
    TaskSpecSchema.inputs: {"in": "load_csv_data.cudf_out"},
    TaskSpecSchema.module: 'greenflow_gquant_plugin.transform'
}

task_stock_symbol = {
    TaskSpecSchema.task_id: "stock_symbol",
    TaskSpecSchema.node_type: "StockNameLoader",
    TaskSpecSchema.conf: {"file": "../data/security_master.csv.gz"},
    TaskSpecSchema.inputs: {},
    TaskSpecSchema.module: 'greenflow_gquant_plugin.dataloader'
}

But greenflow is supposed to be smart enough to find the node automatically without specifying the module explicitly as long as some plugin provides this node. Looks like this automatic search functionality is broken right now. I would have to debug and fix greenflow.

I had trouble running this indicator demo notebook regardless. The "bqplot" was giving me trouble plotting, and there's a path lookup in the "cuInidicator/viz" package.

load_modules(os.getenv('MODULEPATH')+'/rapids_modules/')
from rapids_modules.cuindicator import . . .

Replace all "rapids_modules" with "greenflow_gquant_plugin" which I think should be the correct way to do it. And remove load_modules(os.getenv('MODULEPATH')+'/rapids_modules/').

Alternatively, to get it working without modifying any code I made a symbolic link in "gQuant/plugins/gquant_plugin/modules"

rapids_modules -> <absolute_path_to>/gQuant/plugins/gquant_plugin/greenflow_gquant_plugin/

Then started jupyter lab from directory: "gQuant/plugins/gquant_plugin"

# MODUELPATH corresponds to: "gQuant/plugins/gquant_plugin/modules"
export MODULEPATH=${PWD}/modules
jupyter lab --ip=0.0.0.0 # etc...

Even with all the fixes, I couldn't get bqplot to work. It's not plotting correctly in that notebook.