I have a large existing AWS account with a few years of CloudTrail log data already in my source S3 bucket. When deploying this solution and manually running the first crawler it runs, but does not finish, at least in any reasonable amount of time. I ran the CloudTrailRawCrawler crawler for 24 hours and it didn't finish the first crawl of the source CloudTrail bucket. I suspect this is due to a few years worth of daily partitions and very large number of small existing CloudTrail log files. Not this source S3 bucket only contains the CloudTrail logs for one account that is 99% dominated by 1 AWS region. So, there aren't an unreasonable number of partitions to crawl.
Is there any way to speed up or parallelize the initial crawl?
I have a large existing AWS account with a few years of CloudTrail log data already in my source S3 bucket. When deploying this solution and manually running the first crawler it runs, but does not finish, at least in any reasonable amount of time. I ran the CloudTrailRawCrawler crawler for 24 hours and it didn't finish the first crawl of the source CloudTrail bucket. I suspect this is due to a few years worth of daily partitions and very large number of small existing CloudTrail log files. Not this source S3 bucket only contains the CloudTrail logs for one account that is 99% dominated by 1 AWS region. So, there aren't an unreasonable number of partitions to crawl.
Is there any way to speed up or parallelize the initial crawl?