canonical / tempo-coordinator-k8s-operator

This charmed operator is part of automation the operational procedures of running Grafana Tempo, an open-source tracing backend, in microservices mode.
Apache License 2.0
0 stars 0 forks source link

_external_url is bork #25

Open PietroPasotti opened 3 weeks ago

PietroPasotti commented 3 weeks ago

Bug Description

tempo coordinator errors out on relation to prometheus if s3 isn't integrated, because of an issue on how the external url is calculated.

To Reproduce

juju deploy cos-lite --channel edge juju deploy tempo-coordinator-k8s --channel edge tempo juju relate tempo prometheus

Environment

microk8s on top of openstack on top of multipass

(good luck)

Relevant log output

unit-tempo-0: 12:48:47 WARNING unit.tempo/0.juju-log metrics-endpoint:38: 'source_url' should start with a scheme, such as 'http://'. Assuming 'http:/
/' since none is present.                                                                                                                             
unit-tempo-0: 12:48:47 ERROR unit.tempo/0.juju-log metrics-endpoint:38: Uncaught exception while in charm code:                                       
Traceback (most recent call last):                                                                                                                    
  File "/var/lib/juju/agents/unit-tempo-0/charm/./src/charm.py", line 390, in <module>                                                                
    main(TempoCoordinatorCharm)                                                                                                                       
  File "/var/lib/juju/agents/unit-tempo-0/charm/venv/ops/main.py", line 551, in main                                                                  
    manager.run()                                                                                                                                     
  File "/var/lib/juju/agents/unit-tempo-0/charm/venv/ops/main.py", line 530, in run                                                                   
    self._emit()                                                                                                                                      
  File "/var/lib/juju/agents/unit-tempo-0/charm/venv/ops/main.py", line 519, in _emit                                                                 
    _emit_charm_event(self.charm, self.dispatcher.event_name)                                                                                         
  File "/var/lib/juju/agents/unit-tempo-0/charm/venv/ops/main.py", line 147, in _emit_charm_event                                                     
    event_to_emit.emit(*args, **kwargs)                                                                                                               
  File "/var/lib/juju/agents/unit-tempo-0/charm/venv/ops/framework.py", line 348, in emit                                                             
    framework._emit(event)                                                                                                                            
  File "/var/lib/juju/agents/unit-tempo-0/charm/venv/ops/framework.py", line 860, in _emit                                                            
    self._reemit(event_path)                                                                                                                          
  File "/var/lib/juju/agents/unit-tempo-0/charm/venv/ops/framework.py", line 950, in _reemit                                                          
    custom_handler(event)                                                                                                                             
  File "/var/lib/juju/agents/unit-tempo-0/charm/lib/charms/prometheus_k8s/v0/prometheus_scrape.py", line 1527, in set_scrape_job_spec                 
    self._set_unit_ip()                                                                                                                               
  File "/var/lib/juju/agents/unit-tempo-0/charm/lib/charms/prometheus_k8s/v0/prometheus_scrape.py", line 1572, in _set_unit_ip                        
    relation.data[self._charm.unit]["prometheus_scrape_unit_address"] = unit_address                                                                  
  File "/var/lib/juju/agents/unit-tempo-0/charm/venv/ops/model.py", line 1792, in __setitem__                                                         
    self._validate_write(key, value)                                                                                                                  
  File "/var/lib/juju/agents/unit-tempo-0/charm/venv/ops/model.py", line 1760, in _validate_write                                                     
    raise RelationDataTypeError(f'relation data values must be strings, not {type(value)}')                                                           
ops.model.RelationDataTypeError: relation data values must be strings, not <class 'NoneType'>                                                         
unit-tempo-0: 12:48:47 ERROR juju.worker.uniter.operation hook "metrics-endpoint-relation-joined" (via hook dispatching script: dispatch) failed: exit
 status 1                                                                                                                                             
unit-tempo-0: 12:48:47 INFO juju.worker.uniter awaiting error resolution for "relation-joined" hook

Additional context

No response

mmkay commented 3 weeks ago

It's easier to see even without the whole setup-on-setup:

  1. deploy cos-lite first, then deploy tempo-coordinator and tempo-worker
  2. jhack imatrix fill
  3. no profit

It works if you deploy coordinator and worker first and relate them, then add cos-lite and jhack imatrix fill.

It might be an issue with the shared coordinator-worker object.

PietroPasotti commented 3 weeks ago

yes, the underlying issue is that tempo will not set up ingress until s3 is related and configured, and the _external_url property makes some incorrect assumptions about what TraefikRouteRequirer.is_ready means.

This should fix it:

    @property
    def _external_url(self) -> str:
        """Return the external url."""
        # traefik-route's is_ready() doesn't mean that there is data in the databags, hence the explicit checks.
        if self.ingress.is_ready() and self.ingress.scheme and self.ingress.external_host:
            ingress_url = f"{self.ingress.scheme}://{self.ingress.external_host}"