Datatamer / tamr-client

Programmatically interact with Tamr
https://tamr-client.readthedocs.io
Apache License 2.0
11 stars 25 forks source link

Unable to fetch published_clusters #65

Closed DerrickRice closed 5 years ago

DerrickRice commented 5 years ago

🐛 bug report

Unable to retrieve published_clusters from a mastering project.

😯 Current Behavior

  1. Published clusters via the UI
  2. Fetch project from the client, convert to mastering, and get published_clusters
  3. Attempt clusters.status() raises a 404 exception
  4. Attempt clusters.records() gives a list with one element that is an error dict.

image

>>> project = client.projects.by_external_id('idogs')
>>> project.name
'idogs'
>>> project = project.as_mastering()
>>> clusters = project.
project.api_path              project.client                project.external_id           project.high_impact_pairs(    project.pairs(                project.resource_id
project.as_categorization(    project.data                  project.from_data(            project.name                  project.published_clusters(   project.type
project.as_mastering(         project.description           project.from_json(            project.pair_matching_model(  project.relative_id           project.unified_dataset(
>>> clusters = project.published_clusters()
>>> clusters.status()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/drice/c/tamr/unify-client-python/tamr_unify_client/models/dataset/resource.py", line 75, in status
    status_json = self.client.get(self.api_path + "/status").successful().json()
  File "/home/drice/c/tamr/unify-client-python/tamr_unify_client/client.py", line 19, in successful
    self.raise_for_status()
  File "/home/drice/.local/lib/python3.6/site-packages/requests/models.py", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://localhost:4443/api/versioned/v1/projects/3/publishedClusters/status
>>> data = list(clusters.records())
>>> print(len(data))
1
>>> pp(data[0])
{'causedBy': None,
 'class': 'javax.ws.rs.NotFoundException',
 'message': 'HTTP 404 Not Found',
 'service': 'pubapi',
 'stackTrace': ['org.glassfish.jersey.server.ServerRuntime$2::run::323',
                'org.glassfish.jersey.internal.Errors$1::call::271',
                'org.glassfish.jersey.internal.Errors$1::call::267',
                'org.glassfish.jersey.internal.Errors::process::315',
                'org.glassfish.jersey.internal.Errors::process::297',
                'org.glassfish.jersey.internal.Errors::process::267',
                'org.glassfish.jersey.process.internal.RequestScope::runInScope::317',
                'org.glassfish.jersey.server.ServerRuntime::process::305',
                'org.glassfish.jersey.server.ApplicationHandler::handle::1154',
                'org.glassfish.jersey.servlet.WebComponent::serviceImpl::473',
                'org.glassfish.jersey.servlet.WebComponent::service::427',
                'org.glassfish.jersey.servlet.ServletContainer::service::388',
                'org.glassfish.jersey.servlet.ServletContainer::service::341',
                'org.glassfish.jersey.servlet.ServletContainer::service::228',
                'io.dropwizard.jetty.NonblockingServletHolder::handle::49',
                'org.eclipse.jetty.servlet.ServletHandler$CachedChain::doFilter::1655',
                'io.dropwizard.servlets.ThreadNameFilter::doFilter::34',
                'org.eclipse.jetty.servlet.ServletHandler$CachedChain::doFilter::1642',
                'io.dropwizard.jersey.filter.AllowedMethodsFilter::handle::45',
                'io.dropwizard.jersey.filter.AllowedMethodsFilter::doFilter::39',
                'org.eclipse.jetty.servlet.ServletHandler$CachedChain::doFilter::1642',
                'com.palantir.websecurity.filters.JerseyAwareWebSecurityFilter::doFilter::63',
                'org.eclipse.jetty.servlet.ServletHandler$CachedChain::doFilter::1642',
                'com.serviceenabled.dropwizardrequesttracker.RequestTrackerServletFilter::doFilter::49',
                'org.eclipse.jetty.servlet.ServletHandler$CachedChain::doFilter::1642',
                'com.tamr.zookeeper.dw.servicestate.ServiceStateFilter::doFilter::73',
                'org.eclipse.jetty.servlet.ServletHandler$CachedChain::doFilter::1642'],
 'status': 404}
>>>

🤔 Expected Behavior

🔦 Context

Unable to use the API to fetch Tamr Unify's results.

🌍 Your Environment

Software Version(s)
Python 3.6
Tamr Unify server Tamr Unify 2019.003.0 build 3a89900beb
tamr-unify-client 4.0-dev
Operating System Ubuntu
nbateshaus commented 5 years ago

We can work around this shortcoming in the API by:

  1. Fetching the unified dataset
  2. Getting the name of the unified dataset and appending the right "published clusters" suffix
  3. return project.client.datasets.by_name(the_name)