TykTechnologies / tyk

Tyk Open Source API Gateway written in Go, supporting REST, GraphQL, TCP and gRPC protocols
Other
9.76k stars 1.09k forks source link

[TT-13139] Request times out in some cases when sending input via http inputs #6601

Closed buraksezer closed 1 month ago

buraksezer commented 1 month ago

User description

TT-13139
Summary Request times out in some cases when sending input via http inputs
Type Bug Bug
Status In Dev
Points N/A
Labels -

Cherry-picked stream caching feature from this branch: https://github.com/TykTechnologies/tyk/pull/6538

Two new integration tests have been added to test input http -> output http scenario. See this issue for the details: https://tyktech.atlassian.net/browse/TT-13139

Closing the previous one: https://github.com/TykTechnologies/tyk/pull/6592


PR Type

Enhancement, Tests


Description


Changes walkthrough ๐Ÿ“

Relevant files
Enhancement
mw_streaming.go
Implement stream caching and garbage collection in StreamingMiddleware

gateway/mw_streaming.go
  • Introduced stream caching and garbage collection for inactive streams.
  • Added new fields to manage stream activity and cache.
  • Implemented a garbage collection routine for stream managers.
  • Updated stream manager creation to utilize caching.
  • +98/-20 
    Tests
    mw_streaming_test.go
    Add integration tests for HTTP server streaming scenarios

    gateway/mw_streaming_test.go
  • Added tests for single and multiple client streaming scenarios.
  • Implemented test for HTTP server input and WebSocket output.
  • Verified message distribution and handling in tests.
  • +137/-0 

    ๐Ÿ’ก PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    buger commented 1 month ago

    I'm a bot and I ๐Ÿ‘ this PR title. ๐Ÿค–

    github-actions[bot] commented 1 month ago

    PR Reviewer Guide ๐Ÿ”

    Here are some key observations to aid the review process:

    โฑ๏ธ Estimated effort to review: 4 ๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ตโšช
    ๐Ÿงช PR contains tests
    ๐Ÿ”’ No security concerns identified
    โšก Recommended focus areas for review

    Concurrency Concerns
    The use of sync.Map and atomic operations suggests that concurrency is a concern. However, it's essential to ensure that all operations on shared resources are safe and that there are no race conditions or deadlocks. Error Handling
    The error handling in the stream removal and garbage collection processes should be reviewed to ensure that errors are handled appropriately and do not lead to inconsistent states or resource leaks. Resource Management
    The implementation of garbage collection for stream managers should be carefully reviewed to ensure that it effectively frees up resources without prematurely removing active streams or causing interruptions.
    github-actions[bot] commented 1 month ago

    API Changes

    --- prev.txt    2024-10-09 09:15:23.890104629 +0000
    +++ current.txt 2024-10-09 09:15:18.070072734 +0000
    @@ -7686,6 +7686,11 @@
        ErrOAuthClientDeleted               = "oauth.client_deleted"
     )
     const (
    +   // ExtensionTykStreaming is the oas extension for tyk streaming
    +   ExtensionTykStreaming = "x-tyk-streaming"
    +   StreamGCInterval      = 1 * time.Minute
    +)
    +const (
        ResetQuota              string = "resetQuota"
        CertificateRemoved      string = "CertificateRemoved"
        CertificateAdded        string = "CertificateAdded"
    @@ -7737,10 +7742,6 @@
         The name for event handlers as defined in the API Definition JSON/BSON
         format
    
    -const (
    -   // ExtensionTykStreaming is the oas extension for tyk streaming
    -   ExtensionTykStreaming = "x-tyk-streaming"
    -)
     const ListDetailed = "detailed"
     const LoopScheme = "tyk"
     const OIDPREFIX = "openid"
    github-actions[bot] commented 1 month ago

    PR Code Suggestions โœจ

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Security
    Improve security by using a more secure hash function for generating cache keys ___ **Replace the MD5 hash function with a more secure hash function like SHA-256 for
    generating cache keys, as MD5 is vulnerable to collision attacks.** [gateway/mw_streaming.go [231]](https://github.com/TykTechnologies/tyk/pull/6601/files#diff-6f565750150d990575c808f1ca8f38483160dc6edf05f1534cd0bedb27c2e6c8R231-R231) ```diff -cacheKey := fmt.Sprintf("%x", md5.Sum(configJSON)) +cacheKey := fmt.Sprintf("%x", sha256.Sum256(configJSON)) ```
    Suggestion importance[1-10]: 9 Why: The suggestion to replace MD5 with SHA-256 for generating cache keys enhances security by mitigating collision vulnerabilities associated with MD5. This is a significant improvement in terms of security best practices.
    9
    Resource management
    Ensure proper resource cleanup by calling the cancel function for each stream ___ **Ensure that the cancel function from the context is called for each stream in
    removeStreamManager to properly release resources and avoid potential memory leaks.** [gateway/mw_streaming.go [143]](https://github.com/TykTechnologies/tyk/pull/6601/files#diff-6f565750150d990575c808f1ca8f38483160dc6edf05f1534cd0bedb27c2e6c8R143-R143) ```diff +sm.cancel() s.streamManagerCache.Delete(cacheKey) ```
    Suggestion importance[1-10]: 8 Why: Adding a call to the cancel function ensures proper resource cleanup and prevents potential memory leaks, which is crucial for maintaining application stability and performance.
    8
    Error handling
    Add error handling for JSON marshaling to enhance robustness ___ **Add error handling for the json.Marshal operation when creating the cache key to
    handle potential serialization issues gracefully.** [gateway/mw_streaming.go [230]](https://github.com/TykTechnologies/tyk/pull/6601/files#diff-6f565750150d990575c808f1ca8f38483160dc6edf05f1534cd0bedb27c2e6c8R230-R230) ```diff -configJSON, _ := json.Marshal(streamsConfig) +configJSON, err := json.Marshal(streamsConfig) +if err != nil { + s.Logger().Errorf("Failed to marshal streams config: %v", err) + return nil +} ```
    Suggestion importance[1-10]: 7 Why: Introducing error handling for the JSON marshaling process improves the robustness of the code by ensuring that serialization issues are caught and logged, preventing potential runtime errors.
    7
    Concurrency management
    Improve thread safety and performance by reviewing locking mechanisms around shared resources ___ **Consider using a more precise locking mechanism or review the necessity of locking
    around lastActivity.Store(time.Now()) to avoid potential race conditions or
    performance bottlenecks.** [gateway/mw_streaming.go [497-499]](https://github.com/TykTechnologies/tyk/pull/6601/files#diff-6f565750150d990575c808f1ca8f38483160dc6edf05f1534cd0bedb27c2e6c8R497-R499) ```diff +h.sm.routeLock.Lock() h.sm.lastActivity.Store(time.Now()) +h.sm.routeLock.Unlock() ```
    Suggestion importance[1-10]: 6 Why: The suggestion to add locking around the `lastActivity.Store` operation addresses potential race conditions, enhancing thread safety. However, it may introduce performance bottlenecks, so the impact is moderate.
    6
    buraksezer commented 1 month ago

    I do not think lastActivity doing its job now. The actual activity happens inside the f(w, r), and if there is long websocket connection, which last more then 10 minutes, it will timeout, even if it were active.

    If there is current "f(w, r)" running for given consumer group, it should be counted as active.

    h.sm.lastActivity.Store(time.Now())
    f(w, r)
    h.sm.lastActivity.Store(time.Now())

    @buger I think wrapping http.ResponseWriter and updating lastActivity in the Write method could be an option.

    buger commented 1 month ago

    @buraksezer

    As alternative, you can just maintain counter of active connections, for each stream manager. Literally:

    h.sm.Inc()
    f(w, r)
    h.sm.Dec()

    And in GC just check if counter non 0.

    buraksezer commented 1 month ago

    @buraksezer

    As alternative, you can just maintain counter of active connections, for each stream manager. Literally:

    h.sm.Inc()
    f(w, r)
    h.sm.Dec()

    And in GC just check if counter non 0.

    This might be the only option because the underlying TCP connection is hijacked for websocket traffic and this invalidates my solution.

    buraksezer commented 1 month ago
    // Unload closes and remove active streams
    func (s *StreamingMiddleware) Unload() {
      s.Logger().Debugf("Unloading streaming middleware %s", s.Spec.Name)
      totalStreams := 0
      s.streamManagers.Range(func(_, value interface{}) bool {
          manager, ok := value.(*StreamManager)
          if !ok {
              return true
          }
          manager.streams.Range(func(_, _ interface{}) bool {
              totalStreams++
              return true
          })
          return true
      })
      globalStreamCounter.Add(-int64(totalStreams))

    Do we really need separate streamManagers sync.Map here, since we already doing similar with streamManagersCache?

    I think we do not need streamManagers map. It's only used to track globalStreamCounter. We can use streamManagersCache for this.

    sonarcloud[bot] commented 1 month ago

    Quality Gate Failed Quality Gate failed

    Failed conditions
    0.0% Coverage on New Code (required โ‰ฅ 80%)
    C Reliability Rating on New Code (required โ‰ฅ A)

    See analysis details on SonarCloud

    Catch issues before they fail your Quality Gate with our IDE extension SonarLint