asu-cactus / netsdb

A system that seamlessly integrates Big Data processing and machine learning model serving in distributed relational database
Apache License 2.0
15 stars 5 forks source link

TPCxAI Inference on NetsDB :: Failing for 2.5M+ Rows #69

Closed venkate5hgunda closed 2 years ago

venkate5hgunda commented 2 years ago

A couple of things required before running the program to replicate the error:

Download the SF1 Dataset into ../experiments/dataset/ folder: aws s3 cp s3://decision-forest-benchmark-paper/datasets/financial_transactions_train_SF1.csv .

Download the Model from here into ../experiments/models/ folder: aws s3 cp s3://decision-forest-benchmark-paper/models/tpcxai_fraud_randomforest_10_8_netsdb . --recursive

Run data-processing scripts: python data_processing.py -d tpcxai_fraud -sf 1 (This will generate the actual test dataset with numerical features. FYI, the training split percentage is 0, so train file would be empty.)

In Project's root DIR, run scons and scons libDFTest and start the cluster with python scripts/startPseudoCluster.py 8 32000 (Started with 32 GB and can go up depending on the available memory. I pushed it till 60GB as the Linux server had 64GB.)

NetsDB Commands for Replicating the Failure: PASSES for 2.5M Rows:

FAILS for full SF1 Rows:

Message I get on the server is something in the lines of "Socket Connection Refused".

jiazou-bigdata commented 2 years ago

@venkate5hgunda

The problem is in the loading process and wrong command process. Please checkout this pull request.

venkate5hgunda commented 2 years ago

Thank you. This issue is now resolved.