Snow-Fox-Data / dss-thread

Dataiku Thread™ Data Catalog Plugin by Snow Fox Data
https://www.snowfoxdata.com/thread-plugin
Other
3 stars 2 forks source link

Corrupted row in __Thread_Index__ #2

Closed clayms closed 2 years ago

clayms commented 2 years ago

When scanning our DSS instance, the indexing always stops too early and does not complete the entire scan. Some projects are not in the Thread_Index.

In the screenshot below you can see from the index subset of a dataframe created directly from __Thread_Index__ that the last row the "key" and "last_modified" columns appear to have been shifted to the left by one column. Then a "NaN" (Null) value is left in the actual "last_modified" column.

Also, the only way to get to nearly any project is by putting the Project Key directly in URL, as this appears to break the search functionality of Thread. See the error message at the bottom of this post.

image

[2022-05-18 13:48:19,339] [27/MainThread] [ERROR] [dataiku.webapps.backend] Exception on /search [GET]
Traceback (most recent call last):  
File "/opt/dataiku/code-env/lib/python3.7/site-packages/flask/app.py", line 2077, in wsgi_app    response = self.full_dispatch_request()  
File "/opt/dataiku/code-env/lib/python3.7/site-packages/flask/app.py", line 1525, in full_dispatch_request    rv = self.handle_user_exception(e)  
File "/opt/dataiku/code-env/lib/python3.7/site-packages/flask/app.py", line 1523, in full_dispatch_request    rv = self.dispatch_request()  
File "/opt/dataiku/code-env/lib/python3.7/site-packages/flask/app.py", line 1509, in dispatch_request    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)  
File "<string>", line 155, in search  File "/opt/dataiku/code-env/lib/python3.7/site-packages/pandas/core/frame.py", line 2682, in __getitem__    return self._getitem_array(key)  
File "/opt/dataiku/code-env/lib/python3.7/site-packages/pandas/core/frame.py", line 2709, in _getitem_array    if com.is_bool_indexer(key):  
File "/opt/dataiku/code-env/lib/python3.7/site-packages/pandas/core/common.py", line 107, in is_bool_indexer    
raise ValueError('cannot index with vector containing 'ValueError: cannot index with vector containing NA / NaN values
rymoore commented 2 years ago

Hi @clayms - please try pulling the latest code from the main branch. After updating the plugin, you'll need to stop and re-start the web application in your Dataiku project and then perform a full re-scan (no definitions will be lost while doing the rescan). Let us know how it goes - thanks!

clayms commented 2 years ago

Thanks @rymoore the issue has been solved.