dadoonet / fscrawler

Elasticsearch File System Crawler (FS Crawler)
https://fscrawler.readthedocs.io/
Apache License 2.0
1.35k stars 300 forks source link

"No Such File" when crawling a directory that ends with space("ThisDirHasSpaceAtEnd ") #1952

Open jens-idoer opened 1 week ago

jens-idoer commented 1 week ago

Describe the bug

When running FSCrawler (docker image) with a target directory on an SFTP server we get an error when we try to crawl a directory that has a space (" ") at the end of the name. The error we get is this: WARN [f.p.e.c.f.FsParserAbstract] Error while crawling /xx: SFTP error (SSH_FX_NO_SUCH_FILE): No such file Please not that the "/xx" here is the "root" directory from the _settings.json file and not the directory which has the space at the end.

Job Settings

Logs

+ fscrawler --debug --loop 1 fscrawler_job
12:11:21,106 WARN  [f.p.e.c.f.c.FsCrawlerCli] --debug option has been deprecated. Use FS_JAVA_OPTS="-DLOG_LEVEL=debug" instead.
12:11:21,150 INFO  [f.console] ,----------------------------------------------------------------------------------------------------.
|       ,---,.  .--.--.     ,----..                                     ,--,           2.10-SNAPSHOT |
|     ,'  .' | /  /    '.  /   /   \                                  ,--.'|                         |
|   ,---.'   ||  :  /`. / |   :     :  __  ,-.                   .---.|  | :               __  ,-.   |
|   |   |   .';  |  |--`  .   |  ;. /,' ,'/ /|                  /. ./|:  : '             ,' ,'/ /|   |
|   :   :  :  |  :  ;_    .   ; /--` '  | |' | ,--.--.       .-'-. ' ||  ' |      ,---.  '  | |' |   |
|   :   |  |-, \  \    `. ;   | ;    |  |   ,'/       \     /___/ \: |'  | |     /     \ |  |   ,'   |
|   |   :  ;/|  `----.   \|   : |    '  :  / .--.  .-. | .-'.. '   ' .|  | :    /    /  |'  :  /     |
|   |   |   .'  __ \  \  |.   | '___ |  | '   \__\/: . ./___/ \:     ''  : |__ .    ' / ||  | '      |
|   '   :  '   /  /`--'  /'   ; : .'|;  : |   ," .--.; |.   \  ' .\   |  | '.'|'   ;   /|;  : |      |
|   |   |  |  '--'.     / '   | '/  :|  , ;  /  /  ,.  | \   \   ' \ |;  :    ;'   |  / ||  , ;      |
|   |   :  \    `--'---'  |   :    /  ---'  ;  :   .'   \ \   \  |--" |  ,   / |   :    | ---'       |
|   |   | ,'               \   \ .'         |  ,     .-./  \   \ |     ---`-'   \   \  /             |
|   `----'                  `---`            `--`---'       '---"                `----'              |
+----------------------------------------------------------------------------------------------------+
|                                        You know, for Files!                                        |
|                                     Made from France with Love                                     |
|                           Source: https://github.com/dadoonet/fscrawler/                           |
|                          Documentation: https://fscrawler.readthedocs.io/                          |
`----------------------------------------------------------------------------------------------------'

12:11:21,201 DEBUG [f.p.e.c.p.FsCrawlerPluginsManager] Loading plugins
12:11:21,249 INFO  [f.p.e.c.f.c.BootstrapChecks] Memory [Free/Total=Percent]: HEAP [11.9mb/363.5mb=3.3%], RAM [1.3gb/1.4gb=95.22%], Swap [0b/0b=0.0].
12:11:21,251 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Copying [6/_settings.json]...
12:11:21,266 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Copying [6/_settings_folder.json]...
12:11:21,267 DEBUG [f.p.e.c.f.c.FsCrawlerCli] Starting job [fscrawler_job]...
12:11:21,521 WARN  [f.p.e.c.f.s.Elasticsearch] username is deprecated. Use apiKey instead.
12:11:21,521 WARN  [f.p.e.c.f.s.Elasticsearch] password is deprecated. Use apiKey instead.
12:11:21,523 DEBUG [f.p.e.c.p.FsCrawlerPluginsManager] Starting plugins
12:11:21,536 DEBUG [f.p.e.c.p.FsCrawlerPluginsManager] Found FsCrawlerExtensionFsProvider extension for type [http]
12:11:21,536 DEBUG [f.p.e.c.p.FsCrawlerPluginsManager] Found FsCrawlerExtensionFsProvider extension for type [s3]
12:11:21,536 DEBUG [f.p.e.c.p.FsCrawlerPluginsManager] Found FsCrawlerExtensionFsProvider extension for type [local]
12:11:21,549 DEBUG [f.p.e.c.f.FsParserAbstract] creating fs crawler thread [fscrawler_job] for [/home/spacetest/cases/spacetestcase2] every [15m]
12:11:21,552 INFO  [f.p.e.c.f.FsCrawlerImpl] Starting FS crawler
12:11:21,645 DEBUG [f.p.e.c.f.c.ElasticsearchClient] get version
12:11:22,547 DEBUG [f.p.e.c.f.c.ElasticsearchClient] get version returns 8.9.0 and 8 as the major version number
12:11:22,547 INFO  [f.p.e.c.f.c.ElasticsearchClient] Elasticsearch Client connected to a node running version 8.9.0
12:11:22,547 DEBUG [f.p.e.c.f.c.ElasticsearchClient] is existing pipeline [ddl_fscrawler_pipeline]
12:11:22,559 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Pipeline [ddl_fscrawler_pipeline] was found
12:11:22,566 DEBUG [f.p.e.c.f.s.FsCrawlerManagementServiceElasticsearchImpl] Elasticsearch Management Service started
12:11:22,569 DEBUG [f.p.e.c.f.c.ElasticsearchClient] get version
12:11:22,662 DEBUG [f.p.e.c.f.c.ElasticsearchClient] get version returns 8.9.0 and 8 as the major version number
12:11:22,663 INFO  [f.p.e.c.f.c.ElasticsearchClient] Elasticsearch Client connected to a node running version 8.9.0
12:11:22,663 DEBUG [f.p.e.c.f.c.ElasticsearchClient] is existing pipeline [ddl_fscrawler_pipeline]
12:11:22,688 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Pipeline [ddl_fscrawler_pipeline] was found
12:11:22,689 DEBUG [f.p.e.c.f.s.FsCrawlerDocumentServiceElasticsearchImpl] Elasticsearch Document Service started
12:11:22,689 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Creating/updating component templates
12:11:22,695 DEBUG [f.p.e.c.f.c.ElasticsearchClient] push component template [fscrawler_alias]
12:11:22,717 DEBUG [f.p.e.c.f.c.ElasticsearchClient] push component template [fscrawler_settings_shards]
12:11:22,742 DEBUG [f.p.e.c.f.c.ElasticsearchClient] push component template [fscrawler_settings_total_fields]
12:11:22,753 DEBUG [f.p.e.c.f.c.ElasticsearchClient] push component template [fscrawler_mapping_attributes]
12:11:22,779 DEBUG [f.p.e.c.f.c.ElasticsearchClient] push component template [fscrawler_mapping_file]
12:11:22,792 DEBUG [f.p.e.c.f.c.ElasticsearchClient] push component template [fscrawler_mapping_path]
12:11:22,813 DEBUG [f.p.e.c.f.c.ElasticsearchClient] push component template [fscrawler_mapping_attachment]
12:11:22,825 DEBUG [f.p.e.c.f.c.ElasticsearchClient] push component template [fscrawler_mapping_content]
12:11:22,849 DEBUG [f.p.e.c.f.c.ElasticsearchClient] push component template [fscrawler_mapping_meta]
12:11:22,861 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Creating/updating index templates
12:11:22,863 DEBUG [f.p.e.c.f.c.ElasticsearchClient] push index template [fscrawler_docs_ddl-lab-cust-shared-files]
12:11:22,889 DEBUG [f.p.e.c.f.c.ElasticsearchClient] push index template [fscrawler_folders_ddl-lab-cust-shared-files_folder]
12:11:22,905 INFO  [f.p.e.c.f.FsParserAbstract] FS crawler started for [fscrawler_job] for [/home/spacetest/cases/spacetestcase2] every [15m]
12:11:22,906 DEBUG [f.p.e.c.f.FsParserAbstract] Fs crawler thread [fscrawler_job] is now running. Run #1...
12:11:22,907 DEBUG [f.p.e.c.f.c.s.FsCrawlerSshClient] Create and start SSH client
12:11:23,635 DEBUG [f.p.e.c.f.c.s.FsCrawlerSshClient] Opening SSH connection to u322501@null
12:11:24,061 DEBUG [f.p.e.c.f.c.s.FsCrawlerSshClient] SSH connection successful
12:11:24,141 DEBUG [f.p.e.c.f.FsParserAbstract] indexing [/home/spacetest/cases/spacetestcase2] content
12:11:24,141 DEBUG [f.p.e.c.f.c.s.FileAbstractorSSH] Listing local files from [/home/spacetest/cases/spacetestcase2]
12:11:24,153 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] computeVirtualPathName(/home/spacetest/cases/spacetestcase2, /home/spacetest/cases/spacetestcase2/uploads) = /uploads
12:11:24,153 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] directory = [true], filename = [/uploads], includes = [null], excludes = [null]
12:11:24,153 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] filename = [/uploads], excludes = [null]
12:11:24,154 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] filename = [/uploads], includes = [null]
12:11:24,154 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] filename = [/uploads], excludes = [null]
12:11:24,154 DEBUG [f.p.e.c.f.FsParserAbstract] [/uploads] can be indexed: [true]
12:11:24,154 DEBUG [f.p.e.c.f.FsParserAbstract]   - folder: uploads
12:11:24,155 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] computeVirtualPathName(/home/spacetest/cases/spacetestcase2, /home/spacetest/cases/spacetestcase2/uploads) = /uploads
12:11:24,179 DEBUG [f.p.e.c.f.FsParserAbstract] indexing [/home/spacetest/cases/spacetestcase2/uploads] content
12:11:24,179 DEBUG [f.p.e.c.f.c.s.FileAbstractorSSH] Listing local files from [/home/spacetest/cases/spacetestcase2/uploads]
12:11:24,186 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] computeVirtualPathName(/home/spacetest/cases/spacetestcase2, /home/spacetest/cases/spacetestcase2/uploads/emails) = /uploads/emails
12:11:24,186 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] directory = [true], filename = [/uploads/emails], includes = [null], excludes = [null]
12:11:24,186 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] filename = [/uploads/emails], excludes = [null]
12:11:24,186 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] filename = [/uploads/emails], includes = [null]
12:11:24,186 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] filename = [/uploads/emails], excludes = [null]
12:11:24,186 DEBUG [f.p.e.c.f.FsParserAbstract] [/uploads/emails] can be indexed: [true]
12:11:24,186 DEBUG [f.p.e.c.f.FsParserAbstract]   - folder: emails
12:11:24,187 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] computeVirtualPathName(/home/spacetest/cases/spacetestcase2, /home/spacetest/cases/spacetestcase2/uploads/emails) = /uploads/emails
12:11:24,187 DEBUG [f.p.e.c.f.FsParserAbstract] indexing [/home/spacetest/cases/spacetestcase2/uploads/emails] content
12:11:24,188 DEBUG [f.p.e.c.f.c.s.FileAbstractorSSH] Listing local files from [/home/spacetest/cases/spacetestcase2/uploads/emails]
12:11:24,194 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] computeVirtualPathName(/home/spacetest/cases/spacetestcase2, /home/spacetest/cases/spacetestcase2/uploads/emails/janus@skibby-hc.dk) = /uploads/emails/janus@skibby-hc.dk
12:11:24,194 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] directory = [true], filename = [/uploads/emails/janus@skibby-hc.dk], includes = [null], excludes = [null]
12:11:24,194 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] filename = [/uploads/emails/janus@skibby-hc.dk], excludes = [null]
12:11:24,194 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] filename = [/uploads/emails/janus@skibby-hc.dk], includes = [null]
12:11:24,195 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] filename = [/uploads/emails/janus@skibby-hc.dk], excludes = [null]
12:11:24,195 DEBUG [f.p.e.c.f.FsParserAbstract] [/uploads/emails/janus@skibby-hc.dk] can be indexed: [true]
12:11:24,195 DEBUG [f.p.e.c.f.FsParserAbstract]   - folder: janus@skibby-hc.dk
12:11:24,195 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] computeVirtualPathName(/home/spacetest/cases/spacetestcase2, /home/spacetest/cases/spacetestcase2/uploads/emails/janus@skibby-hc.dk) = /uploads/emails/janus@skibby-hc.dk
12:11:24,196 DEBUG [f.p.e.c.f.FsParserAbstract] indexing [/home/spacetest/cases/spacetestcase2/uploads/emails/janus@skibby-hc.dk] content
12:11:24,196 DEBUG [f.p.e.c.f.c.s.FileAbstractorSSH] Listing local files from [/home/spacetest/cases/spacetestcase2/uploads/emails/janus@skibby-hc.dk]
12:11:24,202 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] computeVirtualPathName(/home/spacetest/cases/spacetestcase2, /home/spacetest/cases/spacetestcase2/uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350) = /uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350
12:11:24,203 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] directory = [true], filename = [/uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350], includes = [null], excludes = [null]
12:11:24,203 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] filename = [/uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350], excludes = [null]
12:11:24,203 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] filename = [/uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350], includes = [null]
12:11:24,203 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] filename = [/uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350], excludes = [null]
12:11:24,203 DEBUG [f.p.e.c.f.FsParserAbstract] [/uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350] can be indexed: [true]
12:11:24,203 DEBUG [f.p.e.c.f.FsParserAbstract]   - folder: INBOX.Sent_350
12:11:24,203 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] computeVirtualPathName(/home/spacetest/cases/spacetestcase2, /home/spacetest/cases/spacetestcase2/uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350) = /uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350
12:11:24,204 DEBUG [f.p.e.c.f.FsParserAbstract] indexing [/home/spacetest/cases/spacetestcase2/uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350] content
12:11:24,204 DEBUG [f.p.e.c.f.c.s.FileAbstractorSSH] Listing local files from [/home/spacetest/cases/spacetestcase2/uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350]
12:11:24,210 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] computeVirtualPathName(/home/spacetest/cases/spacetestcase2, /home/spacetest/cases/spacetestcase2/uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350/Ansatte.xlsx) = /uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350/Ansatte.xlsx
12:11:24,210 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] directory = [false], filename = [/uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350/Ansatte.xlsx], includes = [null], excludes = [null]
12:11:24,210 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] filename = [/uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350/Ansatte.xlsx], excludes = [null]
12:11:24,210 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] filename = [/uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350/Ansatte.xlsx], includes = [null]
12:11:24,210 DEBUG [f.p.e.c.f.FsParserAbstract] [/uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350/Ansatte.xlsx] can be indexed: [true]
12:11:24,210 DEBUG [f.p.e.c.f.FsParserAbstract]   - file: /uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350/Ansatte.xlsx
12:11:24,210 DEBUG [f.p.e.c.f.FsParserAbstract]     - not modified: creation date null , file date 2024-10-04T11:29:43, last scan date 2024-10-04T11:50:21.765268275
12:11:24,210 DEBUG [f.p.e.c.f.FsParserAbstract] Looking for removed files in [/home/spacetest/cases/spacetestcase2/uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350]...
12:11:24,250 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] computeVirtualPathName(/home/spacetest/cases/spacetestcase2, /home/spacetest/cases/spacetestcase2/uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350/Ansatte.xlsx) = /uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350/Ansatte.xlsx
12:11:24,250 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] directory = [false], filename = [/uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350/Ansatte.xlsx], includes = [null], excludes = [null]
12:11:24,250 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] filename = [/uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350/Ansatte.xlsx], excludes = [null]
12:11:24,250 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] filename = [/uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350/Ansatte.xlsx], includes = [null]
12:11:24,251 DEBUG [f.p.e.c.f.FsParserAbstract] Looking for removed directories in [/home/spacetest/cases/spacetestcase2/uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350]...
12:11:24,277 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] computeVirtualPathName(/home/spacetest/cases/spacetestcase2, /home/spacetest/cases/spacetestcase2/uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350.eml) = /uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350.eml
12:11:24,277 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] directory = [false], filename = [/uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350.eml], includes = [null], excludes = [null]
12:11:24,277 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] filename = [/uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350.eml], excludes = [null]
12:11:24,277 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] filename = [/uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350.eml], includes = [null]
12:11:24,277 DEBUG [f.p.e.c.f.FsParserAbstract] [/uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350.eml] can be indexed: [true]
12:11:24,278 DEBUG [f.p.e.c.f.FsParserAbstract]   - file: /uploads/emails/janus@skibby-hc.dk/INBOX.Sent_350.eml
12:11:24,278 DEBUG [f.p.e.c.f.FsParserAbstract]     - not modified: creation date null , file date 2024-10-04T11:29:43, last scan date 2024-10-04T11:50:21.765268275
12:11:24,278 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] computeVirtualPathName(/home/spacetest/cases/spacetestcase2, /home/spacetest/cases/spacetestcase2/uploads/emails/janus@skibby-hc.dk/INBOX.Arkiv^2019_3671) = /uploads/emails/janus@skibby-hc.dk/INBOX.Arkiv^2019_3671
12:11:24,278 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] directory = [true], filename = [/uploads/emails/janus@skibby-hc.dk/INBOX.Arkiv^2019_3671], includes = [null], excludes = [null]
12:11:24,278 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] filename = [/uploads/emails/janus@skibby-hc.dk/INBOX.Arkiv^2019_3671], excludes = [null]
12:11:24,278 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] filename = [/uploads/emails/janus@skibby-hc.dk/INBOX.Arkiv^2019_3671], includes = [null]
12:11:24,278 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] filename = [/uploads/emails/janus@skibby-hc.dk/INBOX.Arkiv^2019_3671], excludes = [null]
12:11:24,279 DEBUG [f.p.e.c.f.FsParserAbstract] [/uploads/emails/janus@skibby-hc.dk/INBOX.Arkiv^2019_3671] can be indexed: [true]
12:11:24,279 DEBUG [f.p.e.c.f.FsParserAbstract]   - folder: INBOX.Arkiv^2019_3671
12:11:24,279 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] computeVirtualPathName(/home/spacetest/cases/spacetestcase2, /home/spacetest/cases/spacetestcase2/uploads/emails/janus@skibby-hc.dk/INBOX.Arkiv^2019_3671) = /uploads/emails/janus@skibby-hc.dk/INBOX.Arkiv^2019_3671
12:11:24,280 DEBUG [f.p.e.c.f.FsParserAbstract] indexing [/home/spacetest/cases/spacetestcase2/uploads/emails/janus@skibby-hc.dk/INBOX.Arkiv^2019_3671] content
12:11:24,280 DEBUG [f.p.e.c.f.c.s.FileAbstractorSSH] Listing local files from [/home/spacetest/cases/spacetestcase2/uploads/emails/janus@skibby-hc.dk/INBOX.Arkiv^2019_3671]
12:11:24,286 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] computeVirtualPathName(/home/spacetest/cases/spacetestcase2, /home/spacetest/cases/spacetestcase2/uploads/emails/janus@skibby-hc.dk/INBOX.Arkiv^2019_3671/0270064298 ) = /uploads/emails/janus@skibby-hc.dk/INBOX.Arkiv^2019_3671/0270064298 
12:11:24,287 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] directory = [true], filename = [/uploads/emails/janus@skibby-hc.dk/INBOX.Arkiv^2019_3671/0270064298 ], includes = [null], excludes = [null]
12:11:24,287 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] filename = [/uploads/emails/janus@skibby-hc.dk/INBOX.Arkiv^2019_3671/0270064298 ], excludes = [null]
12:11:24,287 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] filename = [/uploads/emails/janus@skibby-hc.dk/INBOX.Arkiv^2019_3671/0270064298 ], includes = [null]
12:11:24,287 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] filename = [/uploads/emails/janus@skibby-hc.dk/INBOX.Arkiv^2019_3671/0270064298 ], excludes = [null]
12:11:24,287 DEBUG [f.p.e.c.f.FsParserAbstract] [/uploads/emails/janus@skibby-hc.dk/INBOX.Arkiv^2019_3671/0270064298 ] can be indexed: [true]
12:11:24,287 DEBUG [f.p.e.c.f.FsParserAbstract]   - folder: 0270064298 
12:11:24,288 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] computeVirtualPathName(/home/spacetest/cases/spacetestcase2, /home/spacetest/cases/spacetestcase2/uploads/emails/janus@skibby-hc.dk/INBOX.Arkiv^2019_3671/0270064298 ) = /uploads/emails/janus@skibby-hc.dk/INBOX.Arkiv^2019_3671/0270064298 
12:11:24,288 DEBUG [f.p.e.c.f.FsParserAbstract] indexing [/home/spacetest/cases/spacetestcase2/uploads/emails/janus@skibby-hc.dk/INBOX.Arkiv^2019_3671/0270064298 ] content
12:11:24,288 DEBUG [f.p.e.c.f.c.s.FileAbstractorSSH] Listing local files from [/home/spacetest/cases/spacetestcase2/uploads/emails/janus@skibby-hc.dk/INBOX.Arkiv^2019_3671/0270064298 ]
12:11:24,292 WARN  [f.p.e.c.f.FsParserAbstract] Error while crawling /home/spacetest/cases/spacetestcase2: SFTP error (SSH_FX_NO_SUCH_FILE): No such file
12:11:24,293 WARN  [f.p.e.c.f.FsParserAbstract] Full stacktrace
java.io.UncheckedIOException: SFTP error (SSH_FX_NO_SUCH_FILE): No such file
    at org.apache.sshd.sftp.client.impl.SftpIterableDirEntry.iterator(SftpIterableDirEntry.java:67) ~[sshd-sftp-2.13.2.jar:2.13.2]
    at org.apache.sshd.sftp.client.impl.SftpIterableDirEntry.iterator(SftpIterableDirEntry.java:35) ~[sshd-sftp-2.13.2.jar:2.13.2]
    at java.base/java.lang.Iterable.spliterator(Iterable.java:101) ~[?:?]
    at fr.pilato.elasticsearch.crawler.fs.crawler.ssh.FileAbstractorSSH.getFiles(FileAbstractorSSH.java:100) ~[fscrawler-crawler-ssh-2.10-SNAPSHOT.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:248) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:314) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:314) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:314) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:314) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.addFilesRecursively(FsParserAbstract.java:314) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.run(FsParserAbstract.java:162) [fscrawler-core-2.10-SNAPSHOT.jar:?]
    at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
Caused by: org.apache.sshd.sftp.common.SftpException: No such file
    at org.apache.sshd.sftp.client.impl.AbstractSftpClient.throwStatusException(AbstractSftpClient.java:277) ~[sshd-sftp-2.13.2.jar:2.13.2]
    at org.apache.sshd.sftp.client.impl.AbstractSftpClient.checkHandleResponse(AbstractSftpClient.java:299) ~[sshd-sftp-2.13.2.jar:2.13.2]
    at org.apache.sshd.sftp.client.impl.AbstractSftpClient.checkHandle(AbstractSftpClient.java:290) ~[sshd-sftp-2.13.2.jar:2.13.2]
    at org.apache.sshd.sftp.client.impl.AbstractSftpClient.openDir(AbstractSftpClient.java:887) ~[sshd-sftp-2.13.2.jar:2.13.2]
    at org.apache.sshd.sftp.client.impl.SftpDirEntryIterator.<init>(SftpDirEntryIterator.java:61) ~[sshd-sftp-2.13.2.jar:2.13.2]
    at org.apache.sshd.sftp.client.impl.SftpIterableDirEntry.iterator(SftpIterableDirEntry.java:65) ~[sshd-sftp-2.13.2.jar:2.13.2]
    ... 11 more
12:11:24,298 INFO  [f.p.e.c.f.FsParserAbstract] Closing FS crawler file abstractor [FileAbstractorSSH].
12:11:24,298 DEBUG [f.p.e.c.f.c.s.FsCrawlerSshClient] Closing FsCrawlerSshClient
12:11:24,312 INFO  [f.p.e.c.f.FsParserAbstract] FS crawler is stopping after 1 run
12:11:24,312 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [fscrawler_job]
12:11:24,313 DEBUG [f.p.e.c.f.c.s.FsCrawlerSshClient] Closing FsCrawlerSshClient
12:11:24,313 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is now stopped
12:11:24,313 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Closing Elasticsearch client manager
12:11:24,313 DEBUG [f.p.e.c.f.f.b.FsCrawlerBulkProcessor] Closing BulkProcessor
12:11:24,313 DEBUG [f.p.e.c.f.f.b.FsCrawlerBulkProcessor] BulkProcessor is now closed
12:11:24,313 DEBUG [f.p.e.c.f.f.b.FsCrawlerBulkProcessor] Executing [6] remaining actions
12:11:24,314 DEBUG [f.p.e.c.f.f.b.FsCrawlerSimpleBulkProcessorListener] Going to execute new bulk composed of 6 actions
12:11:24,326 DEBUG [f.p.e.c.f.c.ElasticsearchEngine] Sending a bulk request of [6] documents to the Elasticsearch service
12:11:24,326 DEBUG [f.p.e.c.f.c.ElasticsearchClient] bulk a ndjson of 2103 characters
12:11:24,378 DEBUG [f.p.e.c.f.f.b.FsCrawlerSimpleBulkProcessorListener] Executed bulk composed of 6 actions
12:11:24,382 DEBUG [f.p.e.c.f.s.FsCrawlerManagementServiceElasticsearchImpl] Elasticsearch Management Service stopped
12:11:24,382 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Closing Elasticsearch client manager
12:11:24,382 DEBUG [f.p.e.c.f.f.b.FsCrawlerBulkProcessor] Closing BulkProcessor
12:11:24,382 DEBUG [f.p.e.c.f.f.b.FsCrawlerBulkProcessor] BulkProcessor is now closed
12:11:24,383 DEBUG [f.p.e.c.f.s.FsCrawlerDocumentServiceElasticsearchImpl] Elasticsearch Document Service stopped
12:11:24,383 DEBUG [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
12:11:24,384 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [fscrawler_job] stopped
12:11:24,384 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [fscrawler_job]
12:11:24,385 DEBUG [f.p.e.c.f.c.s.FsCrawlerSshClient] Closing FsCrawlerSshClient
12:11:24,385 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is now stopped
12:11:24,385 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Closing Elasticsearch client manager
12:11:24,385 DEBUG [f.p.e.c.f.s.FsCrawlerManagementServiceElasticsearchImpl] Elasticsearch Management Service stopped
12:11:24,385 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Closing Elasticsearch client manager
12:11:24,385 DEBUG [f.p.e.c.f.s.FsCrawlerDocumentServiceElasticsearchImpl] Elasticsearch Document Service stopped
12:11:24,386 DEBUG [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
12:11:24,386 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [fscrawler_job] stopped
12:11:24,387 DEBUG [f.p.e.c.p.FsCrawlerPluginsManager] Stopping plugins
FULL FSCrawler LOGS HERE

Expected behavior

We expected the contents of the folder with the space (" ") at the end to be indexed

Versions:

jens-idoer commented 1 week ago

Just to be a bit more precise the directory with whith the space at the end is this: [/uploads/emails/janus@skibby-hc.dk/INBOX.Arkiv^2019_3671/0270064298 ]

jens-idoer commented 5 days ago

Any more information I can provide? .. Any way I can seif (or when) this is going to be fixed ? This is my first "issue" report for this fine application so pls. let me know if there is more I need to do.

dadoonet commented 5 days ago

It's all good. Thanks for the report. I'm just overloaded ATM