TPCxAI Inference on NetsDB :: Failing for 2.5M+ Rows

venkate5hgunda commented 2 years ago

A couple of things required before running the program to replicate the error:

Download the SF1 Dataset into ../experiments/dataset/ folder: aws s3 cp s3://decision-forest-benchmark-paper/datasets/financial_transactions_train_SF1.csv .

Download the Model from here into ../experiments/models/ folder: aws s3 cp s3://decision-forest-benchmark-paper/models/tpcxai_fraud_randomforest_10_8_netsdb . --recursive

Run data-processing scripts: python data_processing.py -d tpcxai_fraud -sf 1 (This will generate the actual test dataset with numerical features. FYI, the training split percentage is 0, so train file would be empty.)

In Project's root DIR, run scons and scons libDFTest and start the cluster with python scripts/startPseudoCluster.py 8 32000 (Started with 32 GB and can go up depending on the available memory. I pushed it till 60GB as the Linux server had 64GB.)

NetsDB Commands for Replicating the Failure: PASSES for 2.5M Rows:

bin/testDecisionForestWithCrossProduct Y 2000000 8 250000 32 model-inference/decisionTree/experiments/dataset/tpcxai_fraud_test.csv model-inference/decisionTree/experiments/models/tpcxai_fraud_randomforest_10_8_netsdb RandomForest
bin/testDecisionForestWithCrossProduct N 2000000 8 250000 32 model-inference/decisionTree/experiments/dataset/tpcxai_fraud_test.csv model-inference/decisionTree/experiments/models/tpcxai_fraud_randomforest_10_8_netsdb RandomForest

FAILS for full SF1 Rows:

bin/testDecisionForestWithCrossProduct Y 7353840 8 919230 32 model-inference/decisionTree/experiments/dataset/tpcxai_fraud_test.csv model-inference/decisionTree/experiments/models/tpcxai_fraud_randomforest_10_8_netsdb RandomForest
bin/testDecisionForestWithCrossProduct N 7353840 8 919230 32 model-inference/decisionTree/experiments/dataset/tpcxai_fraud_test.csv model-inference/decisionTree/experiments/models/tpcxai_fraud_randomforest_10_8_netsdb RandomForest

Message I get on the server is something in the lines of "Socket Connection Refused".

jiazou-bigdata commented 2 years ago

@venkate5hgunda

The problem is in the loading process and wrong command process. Please checkout this pull request.

venkate5hgunda commented 2 years ago

Thank you. This issue is now resolved.

asu-cactus / netsdb

TPCxAI Inference on NetsDB :: Failing for 2.5M+ Rows #69