algolia / algoliasearch-client-swift

⚡️ A fully-featured and blazing-fast Swift API client to interact with Algolia.
MIT License
206 stars 65 forks source link

"Failed to build index" errors #611

Open matt-alltrails opened 5 years ago

matt-alltrails commented 5 years ago

We are seeing an odd issue in the AllTrails app that we are hoping you can assist us with.

As background, our app uses Algolia to index a large amount of trail data. We have an online index and also maintain an offline mirror within the app so that users can still access data while offline. Whenever we ship a new version of the app we include an updated objects and settings file in the bundle, and on the first launch of the new version we re-initialize the mirrored index using index.buildOffline(). If the buildOffline() method fails we log an analytics event that contains the error description returned from the Algolia SDK.

With the latest release of our app we are seeing a large number of index failures with the following error:

HTTPError(statusCode: 500, message: Optional("Failed to build index"))

Screen Shot 2019-05-21 at 8 55 15 AM

We use a third party service called Embrace that logs any network issues that happen during a session. When we look at the network requests that include "algolia" in the path, we see a large number of 404s and some 403s, but no 500s.

We would appreciate any guidance you can offer on troubleshooting this.

We are seeing this issue on multiple iPhone device models and versions of iOS. We are using version 5.1.7 of the AlgoliaSearch-Offline-Swift pod.

spinach commented 5 years ago

Hi @matheda,

Thanks contacting us. So just to make sure I understand:

Before the latest release, you did not have that many "failed to build index" errors, is that correct? I'm also wondering if between the latest release of the app and the one before that, did you change anything related to Algolia, like the pod version or the way you are using Algolia in your codebase, or maybe the settings used on the offline SDK?

Also, I'm curious if the updated objects and settings file in the bundle has suddenly grown a lot in size from the latest release to the previous one?

Finally, if you have any more information around the different conditions of this happening, the device models/ iOS versions, or anything else, that could help us debug the issue.

Thanks!

matt-alltrails commented 5 years ago

Thanks for the quick follow up! I stated something poorly above; we noticed this in the latest release because we added analytics to track it in that release. We don't know whether this has been going on in previous releases or not.

We have been using 5.1.7 of the Algolia SDK for the past couple of releases. There was a change in the initialization of the SDK in this release. We are now specifying a directory for the offline index and we were not before. Here is how we are building our index:

    func buildTrailIndex(completion: @escaping (PreloadUpdateAttemptResult) -> Void) {
        let startTime = Date()
        var attributes = context.attributes

        AnalyticsService.sharedInstance().logEvent("Preload_Index_Build_Started", attributes: attributes)
        LogInfo("Preload index build started: \(attributes) #splash #check")

        guard
            let rootDirectoryURL = context.candidateBuild.rootDirectoryURL,
            let trailIndexSettingsFileURL = context.candidateBuild.trailIndexSettingsFileURL,
            let trailIndexObjectsFileURL = context.candidateBuild.trailIndexObjectsFileURL
            else { fatalError("Candidate build root directory URL not set") }

        guard
            let trailIndex = context.candidateBuild.trailIndex
            else { fatalError("Candidate build trail index not set") }

        guard
            let configuration = Bundle.main.infoDictionary?["AllTrails"] as? [String: Any],
            let algoliaAppId = configuration["algoliaAppId"] as? String,
            let algoliaAPIKey = configuration["algoliaAPIKey"] as? String,
            let algoliaOfflineModeLicenseKey = configuration["algoliaOfflineModeLicenseKey"] as? String
            else { fatalError("Algolia not configured") }

        let client = OfflineClient(appID: algoliaAppId, apiKey: algoliaAPIKey)
        client.rootDataDir = rootDirectoryURL.path
        client.enableOfflineMode(licenseKey: algoliaOfflineModeLicenseKey)
        self.client = client

        let index = client.index(withName: trailIndex)
        index.mirrored = true
        index.requestStrategy = .fallbackOnTimeout
        self.index = index

        let query = Query(query: "*")
        let dataSelectionQuery = DataSelectionQuery(query: query, maxObjects: 1000000)
        index.addDataSelectionQuery(dataSelectionQuery)

        index.buildOffline(settingsFile: trailIndexSettingsFileURL.path,
                           objectFiles: [trailIndexObjectsFileURL.path]) {
                            [unowned self] (dictionary, error) in

                            self.isolationQueue.async { [unowned self] in
                                let duration = Date().timeIntervalSince(startTime)
                                attributes["duration"] = "\(duration)"

                                var success = true
                                if let error = error {
                                    success = false
                                    attributes["error"] = "\(error)"
                                }
                                attributes["success"] = success ? "true" : "false"

                                AnalyticsService.sharedInstance().logEvent("Preload_Index_Build_Finished", attributes: attributes)
                                LogInfo("Preload index build finished: \(attributes) #splash #check")

                                self.index = nil
                                self.client = nil

                                if success {
                                    self.adoptCandidateBuild(completion: completion)
                                } else {
                                    completion(.indexBuildFailed)
                                }
                            }
        }

    }

Regarding the objects and settings JSON files, they grow a little bit with every release but not substantially from release to release. Our objects file is 92MB and the settings file is 2KB.

One thing we would like to confirm is that even though the SDK is returning an HTTPError object with a 500 code, it doesn't appear to actually be making a failed network call. Is that true?

spinach commented 5 years ago

I see. Can you share with us the index name you are trying to mirror? That could help us reproduce the issue on our end. You can also send us the index name to support@algolia.com if you want.

matt-alltrails commented 5 years ago

I just send the index name via email.

matt-alltrails commented 5 years ago

I received this message via email:

Thanks for sharing the index name.

First thing first, after checking closely again the error messages, we saw that Unauthorized domain errors are being logged, which could be a license issue. It might seem that you are using a license with a bundle name restriction.The "Failed to build index" error could be a side effect of this. Can you first look into that on your end?

I checked the key we are using under the "Offline SDK Licenses" page in the portal and it matches up for the bundle id.

spinach commented 5 years ago

Hey @matheda

I communicated with the engineers working on the engine, and in order to debug this, we propose that we try to reproduce the environment that you're in as well as the error. In order to do so, we need the following from you (via email):

We also would like to know the impact of those errors on your end: do you have an idea if these errors are corrupting any data (online or offline index) on your end? If you noticed any repercussions on your end caused by the "failed to build index", please let us know as this would help us with debugging the issue.

As soon as we receive the files, we will work with the engineers on the engine side to actively investigate this.

matt-alltrails commented 5 years ago

I will send you an email with a link to the zip file. The code snipped above should contain all of the relevant code. We have not been able to reproduce this in our own testing, we are just seeing it via our analytics. Thanks for looking into it!

matt-alltrails commented 5 years ago

This ticket has been partially handled here and partially in email, so I am going to update the ticket just to bring the threads back together.

The last request that we received was to capture the log output from the Algolia SDK:

(From Guy Daher) For the log, first of all thank you for sharing them! I shared it over to different people to see if they can catch something. Unfortunately the logs don't have information related to the "failed to build index". The other errors seem to be related to other things. The Offline SDK normally logs information using NSLog, but we can't see any of it here, and we would need those to try to understand the root cause of the issue.

This created a technical challenge for us:

(From Matthew Daugherty) Following up on capturing the log output, I have hit a snag.  We require iOS 10+, so NSLog is now using os_log (Unified Logging).  Because of this, I don’t think I have a way to intercept the log statements and redirect them to our log file.  The only technique that I am aware of to get os_log output off of a device is too complicated to ask our users to perform

https://download.developer.apple.com/iOS/iOS_Logs/sysdiagnose_Logging_Instructions.pdf

Do you have any suggestions on how we can collect the information you are looking for?  Would you consider adding a delegate method that you would call in addition to NSLog()?  That would allow us to capture the messages and save them to our log file.

Algolia provided a way in which we could capture the NSLog output:

(From Vladislav Fitc) You can redirect os_log error output to your own log file using freopen command (code sample follows)

matt-alltrails commented 5 years ago

Following up on the thread above, we have looked at using freopen() to capture the NSLog output but it represents both a significant amount of either risk or work for us to implement.

We use many 3rd party libraries, and some of them are very verbose in their logging. We don't feel comfortable trying to write the NSLog() output into the same open file that CocoaLumberjack is using due to the risk of IO blocking or data loss. However, if we route the log entries to a separate file it will grow forever unless we implement some type of log rolling or cleanup.

We would like to revisit the possibility of having the SDK modified so that it has a logging callback. What we have in mind is that whenever the Algolia SDK calls NSLog() internally it would also call a delegate method that would allow the host app to store the log message using its current logging strategy.