facebookresearch / ReAgent

A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)
https://reagent.ai
BSD 3-Clause "New" or "Revised" License
3.58k stars 521 forks source link

Tutorial not work #509

Open galoisking opened 3 years ago

galoisking commented 3 years ago

follow https://reagent.ai/rasp_tutorial.html#installing-reagent ,

./reagent/workflow/cli.py run reagent.workflow.training.identify_and_train_network "$CONFIG"

/home/circleci/project/ReAgent/reagent/preprocessing/preprocessor.py:120: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! input.shape == input_presence_byte.shape /home/circleci/project/ReAgent/reagent/preprocessing/preprocessor.py:589: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! elif max_value.item() > MAX_FEATURE_VALUE: /home/circleci/project/ReAgent/reagent/preprocessing/preprocessor.py:594: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! elif min_value.item() < MIN_FEATURE_VALUE: I0721 100356.023 preprocessor.py:37] CUDA availability: False I0721 100356.023 preprocessor.py:45] NOT Using GPU: GPU not requested or not available. /home/circleci/project/ReAgent/reagent/prediction/predictor_wrapper.py:193: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert q_values.shape[1] == 2, f"{q_values.shape}" I0721 100356.088 training.py:269] Saved default_model to DiscreteDQN_default_model_1626861836.torchscript I0721 100356.090 training.py:269] Saved binary_difference_scorer to DiscreteDQN_binary_difference_scorer_1626861836.torchscript

(base) circleci@e79b99c2c4f9:~/project/ReAgent$ mkdir -p /tmp/0 (base) circleci@e79b99c2c4f9:~/project/ReAgent$ cp model_.torchscript /tmp/0/0

(base) circleci@e79b99c2c4f9:~/project/ReAgent$ python serving/examples/ecommerce/customer_simulator.py contextual_bandit.json 0 200 400 600 800 multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/circleci/miniconda3/lib/python3.8/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "/home/circleci/miniconda3/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "serving/examples/ecommerce/customer_simulator.py", line 49, in serve_customer result = post( File "serving/examples/ecommerce/customer_simulator.py", line 24, in post response = urllib.request.urlopen(req, jsondataasbytes) File "/home/circleci/miniconda3/lib/python3.8/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout) File "/home/circleci/miniconda3/lib/python3.8/urllib/request.py", line 525, in open response = self._open(req, data) File "/home/circleci/miniconda3/lib/python3.8/urllib/request.py", line 542, in _open result = self._call_chain(self.handle_open, protocol, protocol + File "/home/circleci/miniconda3/lib/python3.8/urllib/request.py", line 502, in _call_chain result = func(*args) File "/home/circleci/miniconda3/lib/python3.8/urllib/request.py", line 1379, in http_open return self.do_open(http.client.HTTPConnection, req) File "/home/circleci/miniconda3/lib/python3.8/urllib/request.py", line 1354, in do_open r = h.getresponse() File "/home/circleci/miniconda3/lib/python3.8/http/client.py", line 1347, in getresponse response.begin() File "/home/circleci/miniconda3/lib/python3.8/http/client.py", line 307, in begin version, status, reason = self._read_status() File "/home/circleci/miniconda3/lib/python3.8/http/client.py", line 276, in _read_status raise RemoteDisconnected("Remote end closed connection without" http.client.RemoteDisconnected: Remote end closed connection without response """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "serving/examples/ecommerce/customer_simulator.py", line 83, in results: List[Tuple[str, float]] = p.map(serve_customer, list(range(EPOCHS))) File "/home/circleci/miniconda3/lib/python3.8/multiprocessing/pool.py", line 364, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/circleci/miniconda3/lib/python3.8/multiprocessing/pool.py", line 771, in get raise self._value http.client.RemoteDisconnected: Remote end closed connection without response [1]+ Aborted (core dumped) nohup ./serving/build/RaspCli --logtostderr > cli.log

(base) circleci@e79b99c2c4f9:~/project/ReAgent$ cat cli.log I0721 10:05:05.707381 9778 DiskConfigProvider.cpp:9] READING CONFIGS FROM serving/examples/ecommerce/plans I0721 10:05:05.707865 9778 DiskConfigProvider.cpp:48] GOT CONFIG contextual_bandit.json AT serving/examples/ecommerce/plans/contextual_bandit.json I0721 10:05:05.707962 9778 DiskConfigProvider.cpp:52] Registered decision config: contextual_bandit.json I0721 10:05:05.708199 9778 DiskConfigProvider.cpp:48] GOT CONFIG heuristic.json AT serving/examples/ecommerce/plans/heuristic.json I0721 10:05:05.708250 9778 DiskConfigProvider.cpp:52] Registered decision config: heuristic.json I0721 10:05:05.708446 9778 DiskConfigProvider.cpp:48] GOT CONFIG multi_armed_bandit.json AT serving/examples/ecommerce/plans/multi_armed_bandit.json I0721 10:05:05.708492 9778 DiskConfigProvider.cpp:52] Registered decision config: multi_armed_bandit.json I0721 10:05:05.708657 9787 Server.cpp:58] STARTING SERVER [F PytorchActionValueScorer.cpp:74] TORCH ERROR: forward() Expected a value of type 'torch.reagent.core.types.ServingFeatureData' for argument 'state' but instead found type 'Tuple[Tensor, Tensor]'. Position: 1 Declaration: forward(torch.reagent.prediction.predictor_wrapper.DiscreteDqnPredictorWrapper self, torch.reagent.core.types.ServingFeatureData state) -> ((str[], Tensor)) Exception raised from checkArg at ../aten/src/ATen/core/function_schema_inl.h:162 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x6b (0x7ff39a7067eb in /home/circleci/libtorch/lib/libc10.so) frame #1: c10::detail::torchCheckFail(char const, char const, unsigned int, std::cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0xce (0x7ff39a70246e in /home/circleci/libtorch/lib/libc10.so) frame #2: + 0x10194a2 (0x7ff385e5d4a2 in /home/circleci/libtorch/lib/libtorch_cpu.so) frame #3: + 0x101d731 (0x7ff385e61731 in /home/circleci/libtorch/lib/libtorch_cpu.so) frame #4: torch::jit::GraphFunction::operator()(std::vector<c10::IValue, std::allocator >, std::unordered_map<std::cxx11::basic_string<char, std::char_traits, std::allocator >, c10::IValue, std::hash<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, c10::IValue> > > const&) + 0x2d (0x7ff388703e3d in /home/circleci/libtorch/lib/libtorch_cpu.so) frame #5: torch::jit::Method::operator()(std::vector<c10::IValue, std::allocator >, std::unordered_map<std::cxx11::basic_string<char, std::char_traits, std::allocator >, c10::IValue, std::hash<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, c10::IValue> > > const&) + 0x161 (0x7ff388713eb1 in /home/circleci/libtorch/lib/libtorch_cpu.so) frame #6: torch::jit::Module::forward(std::vector<c10::IValue, std::allocator >) + 0x10c (0x7ff399f4540a in ./serving/build/RaspCli) frame #7: reagent::PytorchActionValueScorer::predict[abi:cxx11](reagent::DecisionRequest const&, int, int) + 0x927 (0x7ff399f413ff in ./serving/build/RaspCli) frame #8: reagent::ActionValueScoring::runInternal[abi:cxx11](int, int, reagent::DecisionRequest const&) + 0x5c (0x7ff39a28af52 in ./serving/build/RaspCli) frame #9: reagent::ActionValueScoring::run(reagent::DecisionRequest const&, std::unordered_map<std::cxx11::basic_string<char, std::char_traits, std::allocator >, std::variant<std::cxx11::basic_string<char, std::char_traits, std::allocator >, long, double, std::vector<std::cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::cxx11::basic_string<char, std::char_traits, std::allocator > > >, std::vector<long, std::allocator >, std::vector<double, std::allocator >, std::unordered_map<std::cxx11::basic_string<char, std::char_traits, std::allocator >, std::cxx11::basic_string<char, std::char_traits, std::allocator >, std::hash<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > >, std::unordered_map<std::cxx11::basic_string<char, std::char_traits, std::allocator >, long, std::hash<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, long> > >, std::unordered_map<std::cxx11::basic_string<char, std::char_traits, std::allocator >, double, std::hash<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, double> > >, std::unordered_map<std::cxx11::basic_string<char, std::char_traits, std::allocator >, std::unordered_map<std::cxx11::basic_string<char, std::char_traits, std::allocator >, double, std::hash<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, double> > >, std::hash<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, std::unordered_map<std::cxx11::basic_string<char, std::char_traits, std::allocator >, double, std::hash<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, double> > > > > >, std::vector<reagent::ActionDetails, std::allocator > >, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, std::variant<std::cxx11::basic_string<char, std::char_traits, std::allocator >, long, double, std::vector<std::cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::cxx11::basic_string<char, std::char_traits, std::allocator > > >, std::vector<long, std::allocator >, std::vector<double, std::allocator >, std::unordered_map<std::cxx11::basic_string<char, std::char_traits, std::allocator >, std::cxx11::basic_string<char, std::char_traits, std::allocator >, std::hash<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > >, std::unordered_map<std::cxx11::basic_string<char, std::char_traits, std::allocator >, long, std::hash<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, long> > >, std::unordered_map<std::cxx11::basic_string<char, std::char_traits, std::allocator >, double, std::hash<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, double> > >, std::unordered_map<std::cxx11::basic_string<char, std::char_traits, std::allocator >, std::unordered_map<std::cxx11::basic_string<char, std::char_traits, std::allocator >, double, std::hash<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, double> > >, std::hash<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, std::unordered_map<std::cxx11::basic_string<char, std::char_traits, std::allocator >, double, std::hash<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, double> > > > > >, std::vector<reagent::ActionDetails, std::allocator > > > > > const&) + 0x133 (0x7ff39a28ae39 in ./serving/build/RaspCli) frame #10: + 0xd84fe6 (0x7ff39a24efe6 in ./serving/build/RaspCli) frame #11: + 0xd871a0 (0x7ff39a2511a0 in ./serving/build/RaspCli) frame #12: std::function<void ()>::operator()() const + 0x32 (0x7ff39a25bce2 in ./serving/build/RaspCli) frame #13: void std::__invoke_impl<void, std::function<void ()>&>(std::invoke_other, std::function<void ()>&) + 0x20 (0x7ff39a258da8 in ./serving/build/RaspCli) frame #14: std::invoke_result<std::function<void ()>&>::type std::invoke<std::function<void ()>&>(std::function<void ()>&) + 0x26 (0x7ff39a256723 in ./serving/build/RaspCli) frame #15: std::invoke_result<std::function<void ()>&>::type std::invoke<std::function<void ()>&>(std::function<void ()>&) + 0x20 (0x7ff39a254c2d in ./serving/build/RaspCli) frame #16: tf::Executor::_invoke_static_work(unsigned int, tf::Node) + 0xf3 (0x7ff39a27ef37 in ./serving/build/RaspCli) frame #17: tf::Executor::_invoke(unsigned int, tf::Node) + 0x11b (0x7ff39a27e8ef in ./serving/build/RaspCli) frame #18: tf::Executor::_exploit_task(unsigned int, std::optional<tf::Node*>&) + 0x12e (0x7ff39a27e036 in ./serving/build/RaspCli) frame #19: tf::Executor::_spawn(unsigned int)::{lambda()#1}::operator()() const + 0x78 (0x7ff39a27dbba in ./serving/build/RaspCli) frame #20: void std::invoke_impl<void, tf::Executor::_spawn(unsigned int)::{lambda()#1}>(std::invoke_other, tf::Executor::_spawn(unsigned int)::{lambda()#1}&&) + 0x20 (0x7ff39a283f02 in ./serving/build/RaspCli) frame #21: std::invoke_result<tf::Executor::_spawn(unsigned int)::{lambda()#1}>::type std::__invoke<tf::Executor::_spawn(unsigned int)::{lambda()#1}>(std::invoke_result&&, (tf::Executor::_spawn(unsigned int)::{lambda()#1}&&)...) + 0x26 (0x7ff39a283233 in ./serving/build/RaspCli) frame #22: decltype (__invoke((_S_declval<0ul>)())) std::thread::_Invoker<std::tuple<tf::Executor::_spawn(unsigned int)::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) + 0x28 (0x7ff39a285528 in ./serving/build/RaspCli) frame #23: std::thread::_Invoker<std::tuple<tf::Executor::_spawn(unsigned int)::{lambda()#1}> >::operator()() + 0x1d (0x7ff39a28548f in ./serving/build/RaspCli) frame #24: std::thread::_State_impl<std::thread::_Invoker<std::tuple<tf::Executor::_spawn(unsigned int)::{lambda()#1}> > >::_M_run() + 0x1c (0x7ff39a28542e in ./serving/build/RaspCli) frame #25: + 0xc819d (0x7ff39941b19d in /home/circleci/miniconda/lib/libstdc++.so.6) frame #26: + 0x76db (0x7ff384c2c6db in /lib/x86_64-linux-gnu/libpthread.so.0) frame #27: clone + 0x3f (0x7ff3843b288f in /lib/x86_64-linux-gnu/libc.so.6)

r-angi commented 3 years ago

I'm getting the same error running through the tutorial. When I get to the customer_simulator.py step and it goes to post to RASP and score the prediction, it prints this error in the logs:

[F PytorchActionValueScorer.cpp:75] TORCH ERROR: forward() Expected a value of type '__torch__.reagent.core.types.ServingFeatureData' for argument 'state' but instead found type 'Tuple[Tensor, Tensor]'.
Position: 1
Declaration: forward(__torch__.reagent.prediction.predictor_wrapper.DiscreteDqnPredictorWrapper self, __torch__.reagent.core.types.ServingFeatureData state) -> ((str[], Tensor))
Exception raised from checkArg at ../aten/src/ATen/core/function_schema_inl.h:162 (most recent call first)

I've traced the error down to model.forward(inputs) here: https://github.com/facebookresearch/ReAgent/blob/master/serving/reagent/serving/core/PytorchActionValueScorer.cpp#L50 Maybe the request for the state features in the example needs to be changed somehow?