agentos-project / agentos

The Python Component System (PCS) is an API and CLI for building, running, and sharing Python code. AgentOS is a set of libraries built on top of PCS that make it easy to build, run, and share agents that use Reinforcement Learning.
https://agentos.org
Apache License 2.0
13 stars 4 forks source link

Publishing run fails with when pushed to web server due to `null value in column "instantiate"` #363

Closed nickjalbert closed 2 years ago

nickjalbert commented 2 years ago

Not sure if this is a known issue or not, here's what I'm seeing:

# Run a clean web server server in another tab, and then
git checkout master
cd example_agents/sb3_agent
agentos run sb3_agent
USE_LOCAL_SERVER=True agentos publish-run <agent run ID from the previous execution>

The server blows up with the following:

[17/Apr/2022 13:37:24] "GET /api/v1/runs/778c2b8a96fe4bd6a7aac8a90c6a9c32/ HTTP/1.1" 404 23
run_command: None
{"metrics": {"mean_reward": 9.3, "episode_count": 10.0, "median_reward": 9.0, "training_step_count": 0.0, "step_count": 93.0, "min_reward": 8.0, "training_episode_count": 0.0, "max_reward": 10.0}, "params": {}, "tags": {"run_type": "evaluate", "pcs.is_agent_run": "True", "mlflow.runName": "AgentOS evaluate with Agent 'sb3_agent' and Env 'environment'", "environment_identifier": "environment", "agent_identifier": "sb3_agent", "mlflow.source.git.commit": "4030b81ef1db4b86614e3d8e861eb8d95845edce", "pcs.is_run": "True", "mlflow.source.name": "/home/nickj/agentos/env/bin/agentos", "mlflow.user": "nickj", "mlflow.parentRunId": "0e04b5827a8c45e6a6aac178ef6ddfb2", "mlflow.source.type": "LOCAL", "sb3_agent_run": "True"}}
tags: {'run_type': 'evaluate', 'pcs.is_agent_run': 'True', 'mlflow.runName': "AgentOS evaluate with Agent 'sb3_agent' and Env 'environment'", 'environment_identifier': 'environment', 'agent_identifier': 'sb3_agent', 'mlflow.source.git.commit': '4030b81ef1db4b86614e3d8e861eb8d95845edce', 'pcs.is_run': 'True', 'mlflow.source.name': '/home/nickj/agentos/env/bin/agentos', 'mlflow.user': 'nickj', 'mlflow.parentRunId': '0e04b5827a8c45e6a6aac178ef6ddfb2', 'mlflow.source.type': 'LOCAL', 'sb3_agent_run': 'True'}
agent_id: sb3_agent
env_id: environment
Internal Server Error: /api/v1/runs/
Traceback (most recent call last):
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/models/query.py", line 581, in get_or_create
    return self.get(**kwargs), False
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/models/query.py", line 435, in get
    raise self.model.DoesNotExist(
registry.models.Component.DoesNotExist: Component matching query does not exist.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
psycopg2.errors.NotNullViolation: null value in column "instantiate" violates not-null constraint
DETAIL:  Failing row contains (2022-04-17 13:37:24.079191+00, 2022-04-17 13:37:24.079214+00, sb3_agent, , , , , null, null).

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/core/handlers/base.py", line 181, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
    return view_func(*args, **kwargs)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/rest_framework/viewsets.py", line 125, in view
    return self.dispatch(request, *args, **kwargs)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/rest_framework/views.py", line 509, in dispatch
    response = self.handle_exception(exc)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/rest_framework/views.py", line 469, in handle_exception
    self.raise_uncaught_exception(exc)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
    raise exc
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/rest_framework/views.py", line 506, in dispatch
    response = handler(request, *args, **kwargs)
  File "/usr/lib/python3.9/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/home/nickj/agentos/web/registry/views.py", line 93, in create
    run = Run.create_from_request_data(request.data)
  File "/home/nickj/agentos/web/registry/models.py", line 351, in create_from_request_data
    agent_comp, agent_comp_created = Component.objects.get_or_create(
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/models/query.py", line 588, in get_or_create
    return self.create(**params), True
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/models/query.py", line 453, in create
    obj.save(force_insert=True, using=self.db)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/models/base.py", line 739, in save
    self.save_base(using=using, force_insert=force_insert,
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/models/base.py", line 776, in save_base
    updated = self._save_table(
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/models/base.py", line 881, in _save_table
    results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/models/base.py", line 919, in _do_insert
    return manager._insert(
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/models/query.py", line 1270, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1416, in execute_sql
    cursor.execute(sql, params)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/backends/utils.py", line 98, in execute
    return super().execute(sql, params)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/backends/utils.py", line 66, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
django.db.utils.IntegrityError: null value in column "instantiate" violates not-null constraint
DETAIL:  Failing row contains (2022-04-17 13:37:24.079191+00, 2022-04-17 13:37:24.079214+00, sb3_agent, , , , , null, null).

Proximal cause is when the Component doesn't exist, the get_or_create here doesn't have enough args to actually create a valid Component. Maybe I need to pass a recurse flag or something?

nickjalbert commented 2 years ago

@andyk let me know if this is an expected issue or not!

nickjalbert commented 2 years ago

@andyk unfortunately, still seeing this issue.

On a clean checkout of master of AOS, I start the webserver (confirmed I can load localhost:8000) and run the following (Ubuntu on WSL2):

$ rm -rf mlruns && rm -rf output.txt && rm -rf documentation/demos/mlruns
$ git rev-parse HEAD
593c3a20fa0499ba55049942ea0a08067d89bb82
$ bash documentation/demos/demo_ilya_papag_from_cli.sh

and once it runs, on the webserver tab I see the following error:

Not Found: /api/v1/runs/656fdba774124ee4a76bf05b89efdb54/
[19/Apr/2022 10:46:33] "GET /api/v1/runs/656fdba774124ee4a76bf05b89efdb54/ HTTP/1.1" 404 23
run_command: None
{"metrics": {"mean_reward": -20.166666666666668, "episode_count": 24.0, "median_reward": -20.0, "training_step_count": 21474.0, "step_count": 21474.0, "min_reward": -21.0, "training_episode_count": 24.0, "max_reward": -16.0}, "params": {}, "tags": {"run_type": "learn", "pcs.is_agent_run": "True", "papag_agent_run": "True", "mlflow.runName": "AgentOS learn with Agent 'agent==593c3a20fa0499ba55049942ea0a08067d89bb82' and Env 'agent==593c3a20fa0499ba55049942ea0a08067d89bb82'", "environment_identifier": "agent==593c3a20fa0499ba55049942ea0a08067d89bb82", "agent_identifier": "agent==593c3a20fa0499ba55049942ea0a08067d89bb82", "mlflow.source.git.commit": "964193814522f8df7288379fc1b0741985da5ba8", "pcs.is_run": "True", "mlflow.source.name": "/home/nickj/agentos/lean-env/bin/agentos", "mlflow.user": "nickj", "mlflow.parentRunId": "2d69f36c740a439d852fe506c7babbe0", "mlflow.source.type": "LOCAL"}}
tags: {'run_type': 'learn', 'pcs.is_agent_run': 'True', 'papag_agent_run': 'True', 'mlflow.runName': "AgentOS learn with Agent 'agent==593c3a20fa0499ba55049942ea0a08067d89bb82' and Env 'agent==593c3a20fa0499ba55049942ea0a08067d89bb82'", 'environment_identifier': 'agent==593c3a20fa0499ba55049942ea0a08067d89bb82', 'agent_identifier': 'agent==593c3a20fa0499ba55049942ea0a08067d89bb82', 'mlflow.source.git.commit': '964193814522f8df7288379fc1b0741985da5ba8', 'pcs.is_run': 'True', 'mlflow.source.name': '/home/nickj/agentos/lean-env/bin/agentos', 'mlflow.user': 'nickj', 'mlflow.parentRunId': '2d69f36c740a439d852fe506c7babbe0', 'mlflow.source.type': 'LOCAL'}
agent_id: agent==593c3a20fa0499ba55049942ea0a08067d89bb82
env_id: agent==593c3a20fa0499ba55049942ea0a08067d89bb82
Internal Server Error: /api/v1/runs/
Traceback (most recent call last):
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/models/query.py", line 581, in get_or_create
    return self.get(**kwargs), False
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/models/query.py", line 435, in get
    raise self.model.DoesNotExist(
registry.models.Component.DoesNotExist: Component matching query does not exist.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
psycopg2.errors.NotNullViolation: null value in column "instantiate" violates not-null constraint
DETAIL:  Failing row contains (2022-04-19 10:46:33.173744+00, 2022-04-19 10:46:33.173765+00, agent==593c3a20fa0499ba55049942ea0a08067d89bb82, , , , , null, null).

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/core/handlers/base.py", line 181, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
    return view_func(*args, **kwargs)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/rest_framework/viewsets.py", line 125, in view
    return self.dispatch(request, *args, **kwargs)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/rest_framework/views.py", line 509, in dispatch
    response = self.handle_exception(exc)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/rest_framework/views.py", line 469, in handle_exception
    self.raise_uncaught_exception(exc)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
    raise exc
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/rest_framework/views.py", line 506, in dispatch
    response = handler(request, *args, **kwargs)
  File "/usr/lib/python3.9/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/home/nickj/clean/agentos/web/registry/views.py", line 93, in create
    run = Run.create_from_request_data(request.data)
  File "/home/nickj/clean/agentos/web/registry/models.py", line 351, in create_from_request_data
    agent_comp, agent_comp_created = Component.objects.get_or_create(
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/models/query.py", line 588, in get_or_create
    return self.create(**params), True
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/models/query.py", line 453, in create
    obj.save(force_insert=True, using=self.db)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/models/base.py", line 739, in save
    self.save_base(using=using, force_insert=force_insert,
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/models/base.py", line 776, in save_base
    updated = self._save_table(
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/models/base.py", line 881, in _save_table
    results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/models/base.py", line 919, in _do_insert
    return manager._insert(
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/models/query.py", line 1270, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1416, in execute_sql
    cursor.execute(sql, params)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/backends/utils.py", line 98, in execute
    return super().execute(sql, params)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/backends/utils.py", line 66, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/home/nickj/agentos/lean-env/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
django.db.utils.IntegrityError: null value in column "instantiate" violates not-null constraint
DETAIL:  Failing row contains (2022-04-19 10:46:33.173744+00, 2022-04-19 10:46:33.173765+00, agent==593c3a20fa0499ba55049942ea0a08067d89bb82, , , , , null, null).

[19/Apr/2022 10:46:33] "POST /api/v1/runs/ HTTP/1.1" 500 205797

From the /tmp/papag-components.yaml:

$ cat /tmp/papag-components.yaml
WARNING: version was passed into get_local_path() on a LocalRepo, which means it is being ignored. If this is actually a versioned repo, use GithubRepo or another versioned Repo type.
WARNING: version was passed into get_local_path() on a LocalRepo, which means it is being ignored. If this is actually a versioned repo, use GithubRepo or another versioned Repo type.
WARNING: version was passed into get_local_path() on a LocalRepo, which means it is being ignored. If this is actually a versioned repo, use GithubRepo or another versioned Repo type.
components:
  PAPAGRun==593c3a20fa0499ba55049942ea0a08067d89bb82:
    class_name: PAPAGRun
    dependencies: {}
    file_path: example_agents/papag/papag_run.py
    instantiate: false
    repo: papag_agent_dir
  agent==593c3a20fa0499ba55049942ea0a08067d89bb82:
    class_name: PAPAGAgent
    dependencies:
      PAPAGRun: PAPAGRun==593c3a20fa0499ba55049942ea0a08067d89bb82
    file_path: example_agents/papag/agent.py
    instantiate: true
    repo: papag_agent_dir
    requirements_path: example_agents/papag/requirements.txt
registries: []
repos:
  papag_agent_dir:
    type: github
    url: https://github.com/agentos-project/agentos.git
run_commands: {}
runs: {}

Interestingly, the WARNING: is valid yaml and doesn't seem to cause problems with the reg file. Finally,

$ cat output.txt

 ... # Lots of debugging output

Updates 620, num timesteps 24840, FPS 516
 Last 10 training episodes: mean/median reward -19.8/-20.0, min/max reward -21.0/-16.0

Results for AgentRun 656fdba774124ee4a76bf05b89efdb54

Training results over 24 episodes:
        Overall agent was trained on 21474 transitions over 24 episodes
        Max reward over 24 episodes: -16.0
        Mean reward over 24 episodes: -20.166666666666668
        Median reward over 24 episodes: -20.0
        Min reward over 24 episodes: -21.0

Run 2d69f36c740a439d852fe506c7babbe0 recorded. Execute the following for details:

  agentos status 2d69f36c740a439d852fe506c7babbe0

It looked you were able to get it to work on your machine. Any special sauce that I might be missing?

nickjalbert commented 2 years ago

Maybe I messed up the branch merging or something? Or maybe documentation/demos/demo_ilya_papag_from_cli.sh isn't expected to run right out of the box yet? Let me know if I'm doing something obviously dumb! :P

andyk commented 2 years ago

🤔 I'll look more today. and try on my Windows box too.

andyk commented 2 years ago

Just another observation as I try to reproduce per your instructions: on my M1, I'm using conda and conda install scipy. I doubt that's related, but documenting it just in case.

nickjalbert commented 2 years ago

We debugged this together. The problem was my environment. Fresh install of requirements (specifically pip install -e .) made it work for me!

I bet I installed the web requirements independently at some point (pip install -r web/requirements.txt) and got a fixed version of PCS/AOS. There's probably a better way to install AOS for web...

https://github.com/agentos-project/agentos/blob/e5b76de828db18b64bf54aa4bd1574f09e722024/web/requirements.txt#L1-L3

andyk commented 2 years ago

@nickjalbert I think we decided on the phone that this issue might have been fixed in 97cb6d446827fdd56ad7f138213f1ce9efe17151 -- merged as part of #350.

Can you confirm?

nickjalbert commented 2 years ago

I'll check again tomorrow that all weirdness has resolved with the env fix, but for now I'll close because I'm pretty confident it'll be all good.

nickjalbert commented 2 years ago

Working 💯 💯 locally!