alibaba / EasyRec

A framework for large scale recommendation algorithms.
Apache License 2.0
1.74k stars 316 forks source link

dssm negtive sampler not support odps table? #292

Closed xiahouzuoxin closed 1 year ago

xiahouzuoxin commented 2 years ago

[mE0930 16:31:31.917438 153 edge_loader.cc:98] Try to read next edge file failed, Not found:File system not implemented E0930 16:31:31.917464 6 graph_store.cc:207] Load graph edges failed, Not found:File system not implemented [2022-09-30 16:31:31.917490] Load graph edges failed. [2022-09-30 16:31:31.917492] Not found:File system not implemented [2022-09-30 16:31:31.917496] Server load data failed and exit now. [2022-09-30 16:31:31.917499] Not found:File system not implemented F0930 16:31:31.917500 6 server_impl.cc:163] Server load data failed: Not found:File system not implemented  Check failure stack trace: @ 0x7fcf18872250 google::LogMessage::Fail() @ 0x7fcf18872198 google::LogMessage::SendToLog() @ 0x7fcf18871abb google::LogMessage::Flush() @ 0x7fcf18875306 google::LogMessageFatal::~LogMessageFatal() @ 0x7fcf187f9d6a graphlearn::DefaultServerImpl::Init() @ 0x7fcf18eeed08 _ZZN8pybind1112cpp_function10initializeIZNS0_C4IvN10graphlearn6ServerEJRKSt6vectorINS3_2io10EdgeSourceESaIS7_EERKS5_INS6_10NodeSourceESaISC_EEEJNS_4nameENS_9is_methodENS_7siblingEEEEMT0_FT_DpT1_EDpRKT2_EUlPS4_SB_SG_E_vJSU_SB_SG_EJSH_SI_SJ_EEEvOSL_PFSK_SN_EST_ENKUlRNS_6detail13function_callEE1clES11 @ 0x7fcf18ee21b2 pybind11::cpp_function::dispatcher() @ 0x7fd0767fdeed PyEval_EvalFrameEx @ 0x7fd0767ff1ce PyEval_EvalCodeEx @ 0x7fd0767fe1f6 PyEval_EvalFrameEx @ 0x7fd0767ff1ce PyEval_EvalCodeEx @ 0x7fd0767fe1f6 PyEval_EvalFrameEx @ 0x7fd0767ff1ce PyEval_EvalCodeEx @ 0x7fd0767fe1f6 PyEval_EvalFrameEx @ 0x7fd0767ff1ce PyEval_EvalCodeEx @ 0x7fd0767fe1f6 PyEval_EvalFrameEx @ 0x7fd0767ff1ce PyEval_EvalCodeEx @ 0x7fd07677a8e8 function_call @ 0x7fd07674adc3 PyObject_Call @ 0x7fd07675d54f instancemethod_call @ 0x7fd07674adc3 PyObject_Call @ 0x7fd0767b7910 slot_tp_init @ 0x7fd0767ae328 type_call @ 0x7fd07674adc3 PyObject_Call @ 0x7fd0767fbf07 PyEval_EvalFrameEx @ 0x7fd0767ff1ce PyEval_EvalCodeEx @ 0x7fd0767fe1f6 PyEval_EvalFrameEx @ 0x7fd0767ff1ce PyEval_EvalCodeEx @ 0x7fd0767fe1f6 PyEval_EvalFrameEx @ 0x7fd0767ff1ce PyEval_EvalCodeEx @ 0x7fd0767fe1f6 PyEval_EvalFrameEx @ 0x7fd0767ff1ce PyEval_EvalCodeEx bash: line 1: 6 Aborted (core dumped) python run.py --cmd train --config oss://algo-recsys/algo/xiahouzuoxin/recall/dssm_trsample/dssm_trsample.config --train_tables odps://du_algo_1/tables/ec_list_homepage_recall_trainsamples_by_rankscore_v3/ds=20220928 --eval_tables odps://du_algo_1/tables/deal_rec_recall_dssm_feadump_sample_test_1d/ds=20220928 --boundary_table odps://du_algo_1/tables/deal_rec_recall_dssm_featdump_sample_train_sample_pos_train_v1_binning/ds=20220928 2>&1 7 Done | tee /logs/hostname.log

chengmengli06 commented 2 years ago

init graph from odps table is naturally supported on max compute platform, so what is your startup command?

xiahouzuoxin commented 2 years ago

init graph from odps table is naturally supported on max compute platform, so what is your startup command?

It occurs when I use EasyRec repo and follow doc.

It seems these lines init graph, https://github.com/alibaba/EasyRec/blob/009c01b9fde3324c530a409ce5d96af39987c367/easy_rec/python/core/sampler.py#L516

xiahouzuoxin commented 2 years ago

start command using pai

pai -name easy_rec_ext -project algo_public -Dcmd=train -Dtrain_tables='odps://du_algo_1/tables/ec_list_homepage_recall_dssm_trsample_train_samples/ds=${bizdate}' -Deval_tables='odps://du_algo_1/tables/ec_list_homepage_recall_dssm_trsample_test_samples/ds=${bizdate}' -Dboundary_table='odps://du_algo_1/tables/deal_rec_recall_dssm_featdump_sample_train_sample_pos_train_v1_binning/ds=${bizdate}' -Dcluster='{\"ps\":{\"count\":2,\"cpu\":900,\"memory\":10000},\"worker\":{\"count\":5,\"cpu\":900,\"memory\":10000}}' -Darn='acs:ram::1816563541899700:role/aliyunodpspaidefaultrole' -Dbuckets='oss://algo-recsys.oss-cn-hangzhou-internal.aliyuncs.com/' -Dconfig='oss://algo-recsys/algo/xiahouzuoxin/recall/dssm_trsample/${model_config}.config' -Dmodel_dir="oss://algo-recsys/algo/xiahouzuoxin/recall/dssm_trsample/${model_config}/${bizdate}" -DossHost=oss-cn-hangzhou-internal.aliyuncs.com -- -Dedit_config_json='{\ -- "train_config.num_steps":30000,\ -- "eval_config.num_examples":409600,\ -- "train_config.fine_tune_checkpoint": "oss://algo-recsys/algo/xiahouzuoxin/recall/dssm_trsample/${model_config}/${bizdate_1}",\ -- "data_config.hard_negative_sampler.user_input_path": "odps://du_algo_1/tables/ec_list_homepage_recall_dssm_trsample_users/ds=${bizdate}",\ -- "data_config.hard_negative_sampler.item_input_path": "odps://du_algo_1/tables/ec_list_homepage_recall_hotweighted_items/ds=${bizdate}",\ -- "data_config.hard_negative_sampler.hard_neg_edge_input_path": "odps://du_algo_1/tables/ec_list_homepage_recall_dssm_trsample_hardneg_edge/ds=${bizdate}"\ -- }' -Deval_method='separate' -Dres_project=du_algo_1_dev -Dversion=zuoxin_dev

and pipeline_conf like:

hard_negative_sampler { user_input_path: 'odps://du_algo_1/tables/ec_list_homepage_recall_dssm_trsample_users/ds=20220928' item_input_path: 'odps://du_algo_1/tables/ec_list_homepage_recall_hotweighted_items/ds=20220928' hard_neg_edge_input_path: 'odps://du_algo_1/tables/ec_list_homepage_recall_dssm_trsample_hardneg_edge/ds=20220928' num_sample: 1000 num_hard_sample: 2 num_eval_sample: 1000 attr_fields: 'cspu_id' attr_fields: 'level1_category_id' attr_fields: 'level2_category_id' attr_fields: 'brand_id' attr_fields: 'category_id'

chengmengli06 commented 1 year ago

please add the tables specified in hard_negative_sampler to -Dtables so that the platform will authorize tensorflow to read from these tables.