Open JeremyCloarec opened 2 weeks ago
Attention: Patch coverage is 33.33333%
with 2 lines
in your changes missing coverage. Please review.
Project coverage is 66.25%. Comparing base (
96bbd5a
) to head (bf2ef75
). Report is 3 commits behind head on master.
Files with missing lines | Patch % | Lines |
---|---|---|
...rm/opencti-graphql/src/manager/ingestionManager.ts | 33.33% | 2 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I don't know what are the advantages of using our custom httpClient but are we ok with the fact to bypass it?
I'm not sure either, but it was the only way I was able to bypass the 403 errors. We talked about it with @romain-filigran, and the plan will be to merge it to master and keep a close eye on wether the previous RSS feeds break following this change. If that is the case, this will need to be reverted
I think that the opencti httpclient manages at least proxy configuration, have you test your PR behind a proxy ?
I didn't think about proxy settings you're right, this solution doesn't work. I wasn't able to find the root cause, I suspect a Cloudflare bot protection, but I didn't find any proper way to understand what triggers the 403 rejections. What's weird is that even when sending the exact same request as the browser but with curl, I get a 403 error. When the same request in the browser works properly... Would you be up for a pair debugging session to dig into it?
Proposed changes
Related issues
8736
Checklist
Further comments
In the related issue, all linked feeds are now fetched without any 403 errors. However, the https://cybersecurity.att.com/site/blog-all-rss feed isn't ingested properly, because items in this feed don't have any pubDate metadata, they only have a dc:date. Not sure if we want to modify the RSS parser to use dc:date if no pubDate exist in the item?