alibaba / EasyRec

A framework for large scale recommendation algorithms.
Apache License 2.0
1.76k stars 319 forks source link

split DSSM model got error #356

Open ucas010 opened 1 year ago

ucas010 commented 1 year ago

hi,dear 大佬 from the issue I tried the split script but got the bug bellow:

python  ../easy_rec/python/tools/split_model_pai.py --model_dir ../conf/dssm/_ckpt/export/final/  --user_model_dir ../conf/dssm/userModel/ --item_model_dir ../conf/dssm/itemModel/
Traceback (most recent call last):
  File "../easy_rec/python/tools/split_model_pai.py", line 276, in <module>
    tf.app.run()
  File "python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File /python3.7/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/python3.7/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "../easy_rec/python/tools/split_model_pai.py", line 263, in main
    part_dir=FLAGS.user_model_dir)
  File "../easy_rec/python/tools/split_model_pai.py", line 203, in export
    saver = tf_saver.Saver()
  File "/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 828, in __init__
    self.build()
  File "/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 840, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 865, in _build
    raise ValueError("No variables to save")
ValueError: No variables to save

could u pls help me ? thx

yunkchen commented 1 year ago

/conf/dssm/_ckpt/export/final/ 目录下确认模型导出正确么?

ucas010 commented 1 year ago

python -m easy_rec.python.train_eval这个就是包括导出模型了吧,不用再python -m easy_rec.python.export了吧。 检查如下, drwxr-xr-x 4 root root 59 3月 27 18:54 1679914493 这个文件夹下面

-rw-r--r-- 1 root root 204K 3月  27 18:54 saved_model.pb
drwxr-xr-x 2 root root   66 3月  27 18:54 variables
drwxr-xr-x 2 root root   29 3月  27 18:54 assets
ucas010 commented 1 year ago

ml-1m数据集AUC=0.72这个正常么?

wwxxzz commented 1 year ago

ml-1m数据集AUC=0.72这个正常么?

是正常的,可以尝试通过调参提升指标或者使用真实业务数据。

poson commented 1 year ago

python -m easy_rec.python.train_eval

python -m easy_rec.python.train_eval 这个是训练和评估,没有导出模型呀