Closed qraleq closed 11 months ago
Hi @qraleq, I'm not sure regarding 1.9, but unless the webserver is explicitly configured to communicate with the apiserver on the correct port, I don't think changing this line (- "10008:8008"
in the apiserver service) in the docker compose for 1.9 would work as well...
Are you sure this is the only change you've made when using 1.9?
Hi @jkhenning , this was the only change I did.
version: "3.6"
services:
apiserver:
command:
- apiserver
container_name: clearml-apiserver
image: allegroai/clearml:1.9
restart: unless-stopped
volumes:
- /opt/clearml/logs:/var/log/clearml
- /opt/clearml/config:/opt/clearml/config
- /opt/clearml/data/fileserver:/mnt/fileserver
depends_on:
- redis
- mongo
- elasticsearch
- fileserver
environment:
CLEARML_ELASTIC_SERVICE_HOST: elasticsearch
CLEARML_ELASTIC_SERVICE_PORT: 9200
CLEARML_ELASTIC_SERVICE_PASSWORD: ${ELASTIC_PASSWORD}
CLEARML_MONGODB_SERVICE_HOST: mongo
CLEARML_MONGODB_SERVICE_PORT: 27017
CLEARML_REDIS_SERVICE_HOST: redis
CLEARML_REDIS_SERVICE_PORT: 6379
CLEARML_SERVER_DEPLOYMENT_TYPE: ${CLEARML_SERVER_DEPLOYMENT_TYPE:-linux}
CLEARML__apiserver__pre_populate__enabled: "true"
CLEARML__apiserver__pre_populate__zip_files: "/opt/clearml/db-pre-populate"
CLEARML__apiserver__pre_populate__artifacts_path: "/mnt/fileserver"
CLEARML__services__async_urls_delete__enabled: "true"
ports:
- "10008:8008"
networks:
- backend
- frontend
elasticsearch:
networks:
- backend
container_name: clearml-elastic
environment:
ES_JAVA_OPTS: -Xms2g -Xmx2g -Dlog4j2.formatMsgNoLookups=true
ELASTIC_PASSWORD: ${ELASTIC_PASSWORD}
bootstrap.memory_lock: "true"
cluster.name: clearml
cluster.routing.allocation.node_initial_primaries_recoveries: "500"
cluster.routing.allocation.disk.watermark.low: 500mb
cluster.routing.allocation.disk.watermark.high: 500mb
cluster.routing.allocation.disk.watermark.flood_stage: 500mb
discovery.zen.minimum_master_nodes: "1"
discovery.type: "single-node"
http.compression_level: "7"
node.ingest: "true"
node.name: clearml
reindex.remote.whitelist: '*.*'
xpack.monitoring.enabled: "false"
xpack.security.enabled: "false"
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
image: docker.elastic.co/elasticsearch/elasticsearch:7.17.7
restart: unless-stopped
volumes:
- /opt/clearml/data/elastic_7:/usr/share/elasticsearch/data
- /usr/share/elasticsearch/logs
fileserver:
networks:
- backend
- frontend
command:
- fileserver
container_name: clearml-fileserver
image: allegroai/clearml:1.9
environment:
CLEARML__fileserver__delete__allow_batch: "true"
restart: unless-stopped
volumes:
- /opt/clearml/logs:/var/log/clearml
- /opt/clearml/data/fileserver:/mnt/fileserver
- /opt/clearml/config:/opt/clearml/config
ports:
- "10081:8081"
mongo:
networks:
- backend
container_name: clearml-mongo
image: mongo:4.4.9
restart: unless-stopped
command: --setParameter internalQueryMaxBlockingSortMemoryUsageBytes=196100200
volumes:
- /opt/clearml/data/mongo_4/db:/data/db
- /opt/clearml/data/mongo_4/configdb:/data/configdb
redis:
networks:
- backend
container_name: clearml-redis
image: redis:5.0
restart: unless-stopped
volumes:
- /opt/clearml/data/redis:/data
webserver:
command:
- webserver
container_name: clearml-webserver
# environment:
# CLEARML_SERVER_SUB_PATH : clearml-web # Allow Clearml to be served with a URL path prefix.
image: allegroai/clearml:1.9
restart: unless-stopped
depends_on:
- apiserver
ports:
- "10080:80"
networks:
- backend
- frontend
async_delete:
depends_on:
- apiserver
- redis
- mongo
- elasticsearch
- fileserver
container_name: async_delete
image: allegroai/clearml:1.9
networks:
- backend
restart: unless-stopped
environment:
CLEARML_ELASTIC_SERVICE_HOST: elasticsearch
CLEARML_ELASTIC_SERVICE_PORT: 9200
CLEARML_ELASTIC_SERVICE_PASSWORD: ${ELASTIC_PASSWORD}
CLEARML_MONGODB_SERVICE_HOST: mongo
CLEARML_MONGODB_SERVICE_PORT: 27017
CLEARML_REDIS_SERVICE_HOST: redis
CLEARML_REDIS_SERVICE_PORT: 6379
PYTHONPATH: /opt/clearml/apiserver
CLEARML__services__async_urls_delete__fileserver__url_prefixes: "[${CLEARML_FILES_HOST:-}]"
entrypoint:
- python3
- -m
- jobs.async_urls_delete
- --fileserver-host
- http://fileserver:8081
volumes:
- /opt/clearml/logs:/var/log/clearml
agent-services:
networks:
- backend
container_name: clearml-agent-services
image: allegroai/clearml-agent-services:latest
deploy:
restart_policy:
condition: on-failure
privileged: true
environment:
CLEARML_HOST_IP: XXXXXXXXXXX
CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-}
CLEARML_API_HOST: http://apiserver:8008
CLEARML_FILES_HOST: ${CLEARML_FILES_HOST:-}
CLEARML_API_ACCESS_KEY: XXXXXXXXXXXXXXXX
CLEARML_API_SECRET_KEY: XXXXXXXXXXXXXXXX
CLEARML_AGENT_GIT_USER: ${CLEARML_AGENT_GIT_USER}
CLEARML_AGENT_GIT_PASS: ${CLEARML_AGENT_GIT_PASS}
CLEARML_AGENT_UPDATE_VERSION: ${CLEARML_AGENT_UPDATE_VERSION:->=0.17.0}
CLEARML_AGENT_DEFAULT_BASE_DOCKER: "ubuntu:18.04"
AWS_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID:-}
AWS_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY:-}
AWS_DEFAULT_REGION: ${AWS_DEFAULT_REGION:-}
AZURE_STORAGE_ACCOUNT: ${AZURE_STORAGE_ACCOUNT:-}
AZURE_STORAGE_KEY: ${AZURE_STORAGE_KEY:-}
GOOGLE_APPLICATION_CREDENTIALS: ${GOOGLE_APPLICATION_CREDENTIALS:-}
CLEARML_WORKER_ID: "clearml-services"
CLEARML_AGENT_DOCKER_HOST_MOUNT: "/opt/clearml/agent:/root/.clearml"
SHUTDOWN_IF_NO_ACCESS_KEY: 1
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /opt/clearml/agent:/root/.clearml
depends_on:
- apiserver
entrypoint: >
bash -c "curl --retry 10 --retry-delay 10 --retry-connrefused 'http://apiserver:8008/debug.ping' && /usr/agent/entrypoint.sh"
networks:
backend:
driver: bridge
frontend:
driver: bridge
Oh, but you changed it in more than one service, right?
Yes, that's true!
Also, are you sure you do not have any specific configuration (in your /opt/clearml/config
folder, perhaps?) for the webserver?
This is the config/apiserver.conf
where I just removed the user info. And that's the only change I did to this file.
watch: false # Watch for changes (dev only)
debug: false # Debug mode
pretty_json: false # prettify json response
return_stack: true # return stack trace on error
return_stack_to_caller: true # top-level control on whether to return stack trace in an API response
# if 'return_stack' is true and error contains a status code, return stack trace only for these status codes
# valid values are:
# - an integer number, specifying a status code
# - a tuple of (code, subcode or list of subcodes)
return_stack_on_code: [
[500, 0] # raise on internal server error with no subcode
]
listen {
ip : "0.0.0.0"
port: 8008
}
version {
required: false
default: 1.0
# if set then calls to endpoints with the version
# greater that the current max version will be rejected
check_max_version: false
}
pre_populate {
enabled: false
zip_files: ["/path/to/export.zip"]
fail_on_error: false
# artifacts_path: "/mnt/fileserver"
}
# time in seconds to take an exclusive lock to init es and mongodb
# not including the pre_populate
db_init_timout: 120
mongo {
# controls whether FieldDoesNotExist exception will be raised for any extra attribute existing in stored data
# but not declared in a data model
strict: false
aggregate {
allow_disk_use: true
}
}
elastic {
probing {
# settings for inital probing of elastic connection
max_retries: 4
timeout: 30
}
upgrade_monitoring {
v16_migration_verification: true
}
}
auth {
# verify user tokens
verify_user_tokens: false
# max token expiration timeout in seconds (1 year)
max_expiration_sec: 31536000
# default token expiration timeout in seconds (30 days)
default_expiration_sec: 2592000
# cookie containing auth token, for requests arriving from a web-browser
session_auth_cookie_name: "clearml_token_basic"
# cookie configuration for authorization cookies generated by auth.login
cookies {
httponly: false # allow only http to access the cookies (no JS etc)
secure: true # not using HTTPS
domain: null # Limit to localhost is not supported
max_age: 99999999999
}
# provide a cookie domain override per company
# cookies_domain_override {
# <company-id>: <domain>
# }
fixed_users {
enabled: true
pass_hashed: true
users: [
]
}
}
cors {
origins: "*"
# Not supported when origins is "*"
supports_credentials: true
}
default_company: "d1bd92a3b039400cbafc60a7a5b1e52b"
workers {
# Auto-register unknown workers on status reports and other calls
auto_register: true
# Assume unknow workers have unregistered (i.e. do not raise unregistered error)
auto_unregister: true
# Timeout in seconds on task status update. If exceeded
# then task can be stopped without communicating to the worker
task_update_timeout: 600
}
check_for_updates {
enabled: true
# Check for updates every 24 hours
check_interval_sec: 86400
url: "https://updates.clear.ml/updates"
component_name: "clearml-server"
# GET request timeout
request_timeout_sec: 3.0
}
statistics {
# Note: statistics are sent ONLY if the user has actively opted-in
supported: true
url: "https://updates.clear.ml/stats"
report_interval_hours: 24
agent_relevant_threshold_days: 30
max_retries: 5
max_backoff_sec: 5
}
}
OK, that doesn't seem related. Let me check with the guys 🙂
Do you have a setup with 1.9 that's still working? If so, can you check and see what's the URL used by the webapp to make these calls?
Yes, I have a v1.9.0 server that is running without any issues, the requests are sent to https://ADDRESS:9080/api/v2.23/login.supported_modes where 9080 is the nginx proxied port.
Why 9080 when the setting is 10080?
This is the nginx config:
server {
listen 9080;
server_name XXXXX;
ssl_certificate /etc/letsencrypt/live/XXXXX/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/XXXXX/privkey.pem;
ssl on;
ssl_session_cache builtin:1000 shared:SSL:10m;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!eNULL:!EXPORT:!CAMELLIA:!DES:!MD5:!PSK:!RC4;
ssl_session_tickets on;
ssl_session_timeout 8h;
access_log /var/log/nginx/clearml_app.access.log;
error_log /var/log/nginx/clearml_app.error.log;
location / {
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Fix the It appears that your reverse proxy set up is broken" error.
proxy_pass http://localhost:10080;
proxy_read_timeout 90;
proxy_ssl_server_name on;
proxy_redirect https://localhost:10080 https://ml.forsight.ai:10080;
}
}
Is this something you changed as well? Or is it your own nginx running on top of the ClearML server?
This is my nginx running on top of ClearML server. It's running fine with version 1.9.0.
@qraleq we've identified the issue and will release a patch release today or tomorrow 🙂
That's great 👍🏽 Do you mind sharing what the issue is?
Hi @qraleq. This was a change that was inserted by mistakes, that causes the webserver to use the default apiserver's port, instead of using the reverse-proxy, by accessing /api url. We fixed the issue, and will release a version shortly.
Perfect, thank you for the explanation and fast fix! Best regards!
@qraleq - version 1.10.1 was just released, and should fix this. Please pull the new image (https://github.com/allegroai/clearml-server#upgrading-) and let us know if this fixed your issue
@qraleq closing this. Please reopen if required.
There is an issue with the latest version of clearml-server where the port configuration on apiserver is being overridden, causing the server to use the wrong port. As a result, the server fails to log in because it is sending requests to the incorrect port. Despite changing the port to 10008, the debug console shows that requests are still being sent to the wrong port. This problem does not occur when the version is explicitly set to 1.9.
How it looks in the debug console: