Snow-Fox-Data / dss-thread

Dataiku Thread™ Data Catalog Plugin by Snow Fox Data
https://www.snowfoxdata.com/thread-plugin
Other
3 stars 2 forks source link

Thread Not Returning Results in DSS 11.1 Instances #29

Closed hangtime79 closed 1 year ago

hangtime79 commented 2 years ago

First off, dynamite work! This is incredibly well thought out. I will be giving a shout out to Snow Fox and Excelion in the Dataiku Sales Engineering Global Call.

Describe the bug Thread works perfectly in version 10, however, in version 11.1, Tread returns zero results. This appears to be a deprecation from one of the API calls.


2022-11-03 04:19:56,711 INFO 127.0.0.1 - - [03/Nov/2022 04:19:56] "GET /dss-stats HTTP/1.1" 200 
-THREAD datasets do not exist yet
/opt/dataiku/dss_install/dataiku-dss-11.1.0/python/dataikuapi/dss/dataset.py:132: DeprecationWarning: Dataset.get_definition is deprecated, please use get_settings  
warnings.warn("Dataset.get_definition is deprecated, please use get_settings", DeprecationWarning)
/opt/dataiku/dss_install/dataiku-dss-11.1.0/python/dataikuapi/dss/dataset.py:144: DeprecationWarning: Dataset.set_definition is deprecated, please use get_settings  
warnings.warn("Dataset.set_definition is deprecated, please use get_settings", DeprecationWarning)
2022-11-03 04:20:00,312 INFO Initializing dataset writer for dataset 
THREAD.--Thread-Datasets--2022-11-03 04:20:00,312 INFO Initializing write session
2022-11-03 04:20:00,336 INFO Starting RemoteStreamWriter
2022-11-03 04:20:00,338 INFO Initializing write data stream (sZ7gv1aINY)
2022-11-03 04:20:00,339 INFO Remote Stream Writer closed
2022-11-03 04:20:00,341 INFO Remote Stream Writer: start generate
2022-11-03 04:20:00,341 INFO Waiting for data to send ...
2022-11-03 04:20:00,341 INFO Got end mark, ending send
0 rows successfully written (sZ7gv1aINY)
2022-11-03 04:20:00,552 INFO Initializing dataset writer for dataset 
THREAD.--Thread-Index--2022-11-03 04:20:00,552 INFO Initializing write session
2022-11-03 04:20:00,577 INFO Starting RemoteStreamWriter
2022-11-03 04:20:00,580 INFO Initializing write data stream (B4Qq39Og86)
2022-11-03 04:20:00,581 INFO Remote Stream Writer closed
2022-11-03 04:20:00,583 INFO Remote Stream Writer: start generate
2022-11-03 04:20:00,583 INFO Waiting for data to send ...
2022-11-03 04:20:00,583 INFO Got end mark, ending send
0 rows successfully written (B4Qq39Og86)
2022-11-03 04:20:00,849 INFO Initializing dataset writer for dataset 
THREAD.--Thread-Column-Mapping--
2022-11-03 04:20:00,850 INFO Initializing write session
2022-11-03 04:20:00,884 INFO Starting RemoteStreamWriter
2022-11-03 04:20:00,888 INFO Initializing write data stream (nFw6UApJBf)
2022-11-03 04:20:00,891 INFO Remote Stream Writer: start generate
2022-11-03 04:20:00,891 INFO Waiting for data to send ...
2022-11-03 04:20:00,891 INFO Remote Stream Writer closed
2022-11-03 04:20:00,892 INFO Got end mark, ending send
0 rows successfully written (nFw6UApJBf)

To Reproduce

  1. Create a new project in a 11.1 instance
  2. Add the Visual Webapp Thread
  3. Begin Scanning the DSS Instance
  4. The above will show in the log.
  5. The screenshot below will occur after a minute or two with a modal responding as "disabled"

Expected behavior Thread would perform as normal.

Screenshots At some point after starting the DSS scan. image

Additional context Add any other context about the problem here.

Happy to do any testing as needed.

rymoore commented 2 years ago

Hi @hangtime79 - really glad to hear that you like Thread! Let us know if we can assist you with a demo or any questions!

I believe that the issue you're describing has been resolved in our 1.1.4 release. We have submitted this release to the Dataiku team, but it doesn't appear to be available in the plugin store quite yet.

In the meantime, you can download the zip at https://drive.google.com/file/d/1CpV4s828lcCKXyo855TTfe6FkJlhdPbF/view?usp=share_link

This can be installed as an update to the existing plugin.

Let us know if this resolves the issue!

hangtime79 commented 2 years ago

Ryan,

Sorry. Same result from testing with 11.1. I took this into a clean instance. Still working with 10.x series. What else can I do to assist?

rymoore commented 2 years ago

Hi @hangtime79 - we're having a hard time reproducing your issue. We are running the 1.1.4 release successfully in an 11.1 DSS instance... Can you confirm the value of the project variables in the project you're using to host Thread?

{
  "rescan_cron": "",
  "limit_to_project_tags": [],
  "exclude_project_tags": []
}

Thanks!

rymoore commented 2 years ago

Hi @hangtime79 - looking through our logs, it seems that there may be an issue writing to the file system dataset. Can you verify that the filesystem_folders connection exists:

params = {**'connection': 'filesystem_folders'**, 'path': project_variables['projectKey']  + '/' + ds_loc}
        format_params = {'separator': '\t', 'style': 'unix', 'compress': ''}

        csv_dataset = proj.create_dataset(name, type='Filesystem', params=params,
                                            formatType='csv', formatParams=format_params)
hangtime79 commented 2 years ago

Hey Ryan,

So I went back to a few other instances and I was able to run on 11.1, but came back to my original instance and decided to try and use the "limit_to_project_tags". I have been able to catalog adding projects a few at a time. A project or projects has been stopping Thread from completing its analysis. I will continue to investigate and once I find the project, I will report back here as to why I'm seeing the error.

rymoore commented 2 years ago

Great thanks for your feedback @hangtime79 ! Looking forward to hearing what you find

hangtime79 commented 1 year ago

So behind on my investigations here, this can be closed out. The issue was a related to a dataset that was part of a project, but the dataset could not be read by Dataiku and threw and error. Any call to the API with that particular object would have thrown an error thus the failure. Call this an edge case. Once I cleared the object, Thread was fine.

rymoore commented 1 year ago

Thank you @hangtime79 !