dremio-professional-services / dremio-cloner

27 stars 21 forks source link

Sync cloud instance does not load any catalog content #48

Closed patmania closed 7 months ago

patmania commented 7 months ago

Hi,

While trying to sync our first cloud instance of Dremio i am facing an issue: i started from the read_dremio_cloud config file. The sync does read the catalog and source setup, but won't start the download of their actual contents (homes, sources, spaces etc.). Attached you will find my config file, and my logfile (without the two important guids and username). Thanks for having a look.

Patrick content.json sync-get-log.txt

mxmarg commented 7 months ago

Hi Patrick,

based on the config.json you have provided, it seems like you are still filtering on the following space and source names/types: "space.filter.names": [ "MySpace1", "MySpace2", "NewSpace" ] "source.filter.names": [ "Source1", "Source2", "Source3" ] "source.filter.types": [ "S3", "POSTGRES", "NAS" ]}

Likely, what you are observing is that Dremio cloner reads in all objects (phase 1) and then filters out all of the objects based on the default filter (phase 2).

patmania commented 7 months ago

Hi, thanks for the reply. It looks like i got a bit further, it logs my catalog and sources now. But it does not seem to pickup my spaces. So it doesn't get any of my queries. I will look at your code to see the process does exactly, perhaps there's something missing in the config.

To clarify: in an on-prem sync it logs this: DEBUG:2024-04-01 09:56:48,593:_read_home: processing container:

Such a line (the read_home) is missing in the cloud sync.

Can you confirm if you know that the shipped cloud config works (without the filters)?

Thanks again.

patmania commented 7 months ago

Hi, in the end i got it to work :-)

Please note: as can also be read in the API reference between Dremio cloud and software, cloud does not recognize Home and Spaces.

The catalog is seen as a Source. So in order to read my catalog and my datalake 'schema', i did set this:

"source.filter.names": ["MyCatalog", "MyDatalake"]

And then it worked. Still seeing a small issue where the sync tries to write a 'vds_parent' file with the name of its ID, which contains illegal characters. But i don't need these to be synces yet.

mxmarg commented 7 months ago

Hey @patmania, glad you got the catalog export to work :) I will close this ticket.