databrickslabs / migrate

Old scripts for one-off ST-to-E2 migrations. Use "terraform exporter" linked in the readme.
Other
186 stars 127 forks source link

AttributeError: 'Namespace' object has no attribute 'timeout' Error on importing HMS #282

Closed WAG10 closed 1 year ago

WAG10 commented 1 year ago

Hi Team we are using below command tp import the hive metarstore from primary to secondary workspace. When we run the command we get the error as : Attribute Error: 'Namespace' object has no attribute 'timeout' We commented out the occurrence of the timeout and then it worked but this seems not a ideal way to handle. Requesting your inputs on this.

Command : _python3 /home/cloudsa/migrate_repo_fix/migrate/importdb.py --azure --profile newWS --metastore --skip-failed

Error: _Note: running import_db.py directly is not recommended. Please use migration_pipeline.py Traceback (most recent call last): File "/home/cloudsa/migrate_repo_fix/migrate/import_db.py", line 290, in main() File "/home/cloudsa/migrate_repo_fix/migrate/import_db.py", line 27, in main client_config = build_client_config(args.profile, url, token, args) File "/home/cloudsa/migrate_repo_fix/migrate/dbclient/parser.py", line 432, in build_clientconfig 'timeout': args.timeout, AttributeError: 'Namespace' object has no attribute 'timeout'

WAG10 commented 1 year ago

Hi @gregwood-db , thanks for the fix. Unfortunately i still get the namespace error.(I cloned the repo from manin and manually made the changes as per fix from your commit) cloudsa@mylinuxserver:~/export_dir_4$ python3 /home/cloudsa/migrate_repo_fix2/migrate/import_db.py --azure --profile newWS --metastore --skip-failed Note: running import_db.py directly is not recommended. Please use migration_pipeline.py Traceback (most recent call last): File "/home/cloudsa/migrate_repo_fix2/migrate/import_db.py", line 290, in main() File "/home/cloudsa/migrate_repo_fix2/migrate/import_db.py", line 27, in main client_config = build_client_config(args.profile, url, token, args) File "/home/cloudsa/migrate_repo_fix2/migrate/dbclient/parser.py", line 436, in build_client_config 'skip_missing_users': args.skip_missing_users AttributeError: 'Namespace' object has no attribute 'skip_missing_users'

gregwood-db commented 1 year ago

@WAG10 is there a reason to use import_db.py instead of migration_pipeline.py? We're mostly testing against the latter for new features/flags. I'll have to go through all of the flags to see which are currently missing from the import and export pipelines.

WAG10 commented 1 year ago

hi @gregwood-db , We are using some terraform utilities for exporting other databricks components as per internal architecture. We are using this migrate repo to import/export hive metastore alone hence using import_db.py. If you have some syntax or way which I can use with migration_pipeline.py to import/export hive metastore alone, I will be more than happy to do :) I tried myself couple of times but i was getting big bunch of errors hence to decided to switch import_db.py.

gregwood-db commented 1 year ago

Did you try migration_pipeline.py --import-pipeline --keep-tasks metastore? That should run only the metastore task for the import pipeline.

WAG10 commented 1 year ago

@gregwood-db , Many thanks for the lead. I tried the below command for export first. _cloudsa@mylinuxserver:~/migrate_repo_fix2/migrate$ python3 migrationpipeline.py --export-pipeline --profile oldWS --azure --keep-tasks metastore This exported me the hms metadata. Then i tried running import using below command. _cloudsa@mylinuxserver:~/migrate_repo_fix2/migrate$ python3 migrationpipeline.py --import-pipeline --profile newWS --azure --keep-tasks metastore But here i got below error _Using the session id: M20230906094058 Traceback (most recent call last): File "migration_pipeline.py", line 378, in main() File "migration_pipeline.py", line 373, in main pipeline = build_pipeline(args) File "migration_pipeline.py", line 81, in build_pipeline return build_import_pipeline(client_config, checkpoint_service, args) File "migration_pipeline.py", line 141, in build_import_pipeline with open(source_info_file, 'r') as f: FileNotFoundError: [Errno 2] No such file or directory: 'azure_logs/M20230906094058/sourceinfo.txt'

This was because export which i ran generated different session id folder and import was trying to find the /source_info.txt in the session id folder which got created for import. Any way to solve this issue?

gregwood-db commented 1 year ago

Yes- please specify --session <session-id> where session-id is the session token you got from the export job. This will use the existing export directory to import to the new WS.

gregwood-db commented 1 year ago

Closing this- please let us know if you're still having issues.