elastic / enterprise-search-network-drive-connector

Official Enterprise Search | Workplace Search - Network Drives Connector
Other
6 stars 3 forks source link

Error while Fetching from the Network drive. Checkpoint not saved #30

Closed waliur closed 1 year ago

waliur commented 1 year ago

I have a local docker instance running to test workplace search with the following containers:

I get the following error when trying to perform a sync operation:

root@a4b8ddf1f9d3:/app# ees_network_drive -c network_drive_connector.yml full-sync
Indexing started at: 2023-10-05T08:39:32Z
Error while Fetching from the Network drive. Checkpoint not saved
Traceback (most recent call last):
  File "/root/.local/bin/ees_network_drive", line 8, in <module>
    sys.exit(main())
  File "/root/.local/lib/python3.8/site-packages/ees_network_drive/cli.py", line 92, in main
    run(args)
  File "/root/.local/lib/python3.8/site-packages/ees_network_drive/cli.py", line 100, in run
    commands[args.cmd](args).execute()
  File "/root/.local/lib/python3.8/site-packages/ees_network_drive/full_sync_command.py", line 92, in execute
    self.start_producer(queue, time_range)
  File "/root/.local/lib/python3.8/site-packages/ees_network_drive/full_sync_command.py", line 63, in start_producer
    raise exception
  File "/root/.local/lib/python3.8/site-packages/ees_network_drive/full_sync_command.py", line 47, in start_producer
    store = sync_network_drives.connect_and_get_all_folders()
  File "/root/.local/lib/python3.8/site-packages/ees_network_drive/sync_network_drives.py", line 63, in connect_and_get_all_folders
    path=os.path.join(*self.drive_path.parts[1:]),
TypeError: join() missing 1 required positional argument: 'a'

Toubleshooting steps I carried out

Passes all tests for ent-search and network drive share:

root@a4b8ddf1f9d3:/app# make test_connectivity
venv/bin/pytest ees_network_drive/test_connectivity.py
================================================================================================ test session starts =================================================================================================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.11.0, pluggy-1.3.0
rootdir: /app, configfile: pytest.ini
plugins: custom-exit-code-0.3.0, cov-3.0.0
collected 3 items

ees_network_drive/test_connectivity.py ...                                                                                                                                                                     [100%]

================================================================================================= 3 passed in 0.46s ==================================================================================================
root@a4b8ddf1f9d3:/app#

Further more a manual connection test to the samba docker container is successful:

root@56d4e7a06b85:/app# smbclient -L //samba/
Password for [WORKGROUP\root]:

    Sharename       Type      Comment
    ---------       ----      -------
    share1          Disk
    IPC$            IPC       IPC Service (Docker Samba Server)
SMB1 disabled -- no workgroup available

Here is my network drive connector yml file:

#Configurations for the Network Drive Connector

# ------------------------------- Network Drive configuration settings -------------------------------
#The domain name of the Network Drive server for NTLM authentication
network_drive.domain: "WORKGROUP"
#The username used to login to Network Drive server
network_drive.username: "root"
#The password used to login to Network Drive server
network_drive.password: "bar"
#The relative path of the Network Drive.
network_drive.path: "share1"
# The name of the server hosting the Network Drive
network_drive.server_name: "Samba"
# The IP address of the server hosting the Network Drive
network_drive.server_ip: "samba"
#The name of the machine where the connector will run
client_machine.name: "network-drive-connector"
# ------------------------------- Workplace Search configuration settings -------------------------------
#Access token for Workplace search authentication
enterprise_search.api_key: "256781639e2785ac2b8c7be1005f56f0bc14cc99a2953d1b230a16041cf44a6a"
#Source identifier for the custom source created on the workplace search server
enterprise_search.source_id: "651be140a03b1a898b9598b9"
#Workplace search server address Example: http://es-host:3002 
enterprise_search.host_url: "http://ent-search:3002/"
# ------------------------------- Connector specific configuration settings -------------------------------
#Specifies the objects to be fetched and indexed in the WorkPlace search along with fields that needs to be included/excluded. The list of the objects with a pattern to be included/excluded is provided. By default all the objects are fetched
include:
   size:
   path_template: ["**/*.txt", "**/*.contact", "**/*.docx", "**/*.json", "**/*.png", "**/*.jpg", "**/*.jpeg", "**/*.py", "**/*.yml", "**/*.md", "**/*.ini", "**/*.sh", "**/*.rst", "**/*.pdf", "**/*.rtf", "**/*.ppt", "**/*.file"]
exclude:
  size: [">10000000"]
  path_template:
#The timestamp after which all the objects that are modified or created are fetched from the Network Drive. By default, all the objects present in the Network Drive till the end_time are fetched
start_time : 
#The timestamp before which all the updated objects need to be fetched i.e. the connector won't fetch any object updated/created after the end_time. By default, all the objects updated/added till the current time are fetched
end_time : 
#The level of the logs the user wants to use in the log files. The possible values include: DEBUG, INFO, WARN, ERROR. By default, the level is INFO
log_level: INFO
#The number of retries to perform in case of server error. The connector will use exponential back-off for retry mechanism
retry_count: 3
#Number of threads to be used in multithreading for the Network Drive sync.
network_drives_sync_thread_count: 5
#Number of threads to be used in multithreading for the enterprise search sync.
enterprise_search_sync_thread_count: 5
#Denotes whether document permission will be enabled or not
enable_document_permission: Yes
#The path of csv file containing mapping of Network Drive user ID to Workplace user ID
network_drive_enterprise_search.user_mapping: ""

after looking into https://github.com/elastic/enterprise-search-network-drive-connector/issues/25 and changing network_drive.path to

network_drive.path: "samba/share1"

.. I get the following error:

root@a4b8ddf1f9d3:/app# ees_network_drive -c network_drive_connector.yml full-sync
Indexing started at: 2023-10-05T09:04:32Z
Unknown error while fetching files Failed to list share1 on samba: Unable to connect to shared device
==================== SMB Message 0 ====================
SMB Header:
-----------
Command: 0x03 (SMB2_COM_TREE_CONNECT)
Status: 0x00000000
Flags: 0x00
PID: 76
MID: 3
TID: 0
Data: 34 bytes
b'0900000048001a005c005c00530041004d00420041005c00730061006d0062006100'
SMB Data Packet (hex):
----------------------
b'fe534d42400000000000000003000000000000000000000003000000000000004c00000000000000546e357100000000000000000000000000000000000000000900000048001a005c005c00530041004d00420041005c00730061006d0062006100'
==================== SMB Message 1 ====================
SMB Header:
-----------
Command: 0x03 (SMB2_COM_TREE_CONNECT)
Status: 0xC00000CC
Flags: 0x01
PID: 76
MID: 3
TID: 0
Data: 9 bytes
b'090000000000000000'
SMB Data Packet (hex):
----------------------
b'fe534d4240000000cc0000c003000100010000000000000003000000000000004c00000000000000546e35710000000000000000000000000000000000000000090000000000000000'
Traceback (most recent call last):
  File "/root/.local/lib/python3.8/site-packages/ees_network_drive/files.py", line 93, in recursive_fetch
    file_list = smb_connection.listPath(service_name, rf'{path}', search=16)
  File "/root/.local/lib/python3.8/site-packages/smb/SMBConnection.py", line 210, in listPath
    self._pollForNetBIOSPacket(timeout)
  File "/root/.local/lib/python3.8/site-packages/smb/SMBConnection.py", line 649, in _pollForNetBIOSPacket
    self.feedData(data)
  File "/root/.local/lib/python3.8/site-packages/nmb/base.py", line 54, in feedData
    self._processNMBSessionPacket(self.data_nmb)
  File "/root/.local/lib/python3.8/site-packages/nmb/base.py", line 75, in _processNMBSessionPacket
    self.onNMBSessionMessage(packet.flags, packet.data)
  File "/root/.local/lib/python3.8/site-packages/smb/base.py", line 150, in onNMBSessionMessage
    if self._updateState(self.smb_message):
  File "/root/.local/lib/python3.8/site-packages/smb/base.py", line 344, in _updateState_SMB2
    req.callback(message, **req.kwargs)
  File "/root/.local/lib/python3.8/site-packages/smb/base.py", line 736, in connectCB
    errback(OperationFailure('Failed to list %s on %s: Unable to connect to shared device' % ( path, service_name ), messages_history))
  File "/root/.local/lib/python3.8/site-packages/smb/SMBConnection.py", line 204, in eb
    raise failure
smb.smb_structs.OperationFailure: Failed to list share1 on samba: Unable to connect to shared device
==================== SMB Message 0 ====================
SMB Header:
-----------
Command: 0x03 (SMB2_COM_TREE_CONNECT)
Status: 0x00000000
Flags: 0x00
PID: 76
MID: 3
TID: 0
Data: 34 bytes
b'0900000048001a005c005c00530041004d00420041005c00730061006d0062006100'
SMB Data Packet (hex):
----------------------
b'fe534d42400000000000000003000000000000000000000003000000000000004c00000000000000546e357100000000000000000000000000000000000000000900000048001a005c005c00530041004d00420041005c00730061006d0062006100'
==================== SMB Message 1 ====================
SMB Header:
-----------
Command: 0x03 (SMB2_COM_TREE_CONNECT)
Status: 0xC00000CC
Flags: 0x01
PID: 76
MID: 3
TID: 0
Data: 9 bytes
b'090000000000000000'
SMB Data Packet (hex):
----------------------
b'fe534d4240000000cc0000c003000100010000000000000003000000000000004c00000000000000546e35710000000000000000000000000000000000000000090000000000000000'

Found an end signal in the queue. Closing Thread ID 281473285788128
Thread ID: 281473285788128 Total 0 documents             indexed out of: 0 till now..
Found an end signal in the queue. Closing Thread ID 281473285788128
Thread ID: 281473285788128 Total 0 documents             indexed out of: 0 till now..
Found an end signal in the queue. Closing Thread ID 281473285788128
Thread ID: 281473285788128 Total 0 documents             indexed out of: 0 till now..
Found an end signal in the queue. Closing Thread ID 281473285788128
Thread ID: 281473285788128 Total 0 documents             indexed out of: 0 till now..
Found an end signal in the queue. Closing Thread ID 281473285788128
Thread ID: 281473285788128 Total 0 documents             indexed out of: 0 till now..
Successfully saved the checkpoint
Indexing ended at: 2023-10-05T09:04:32Z
root@a4b8ddf1f9d3:/app#
praveen-elastic commented 1 year ago

Hi @waliur, I am assuming you are working with a linux share configured via Samba.

In samba, each share is exposed as a separate service. So, you'll need to add drive_path as share1 followed by the folder you want to sync, such as share1/folder1

waliur commented 1 year ago

Yes it's a linux Samba Share. Thank you @praveen-elastic - This has worked.

Sync failed due to the need for document level permissions which i'm not 100% sure I understand yet. I turned it off after finding out it's enabled by default. Once I turned it a full sync operation worked as expected!!!

Thank you!!!!! :D

P.S Error messages which appear to be in hex such as these are not helpful:

b'0900000048001a005c005c00530041004d00420041005c00730061006d0062006100'
SMB Data Packet (hex):
----------------------
b'fe534d42400000000000000003000000000000000000000003000000000000004c00000000000000546e357100000000000000000000000000000000000000000900000048001a005c005c00530041004d00420041005c00730061006d0062006100'
==================== SMB Message 1 ====================
SMB Header:

... I'd suggest converting them to text for display if possible.