elastic / integrations

Elastic Integrations
https://www.elastic.co/integrations
Other
28 stars 444 forks source link

Getting Cloudflare logs via Logpush using Cloudflare R2 service #8412

Closed markslott closed 5 months ago

markslott commented 1 year ago

Describe the enhancement: Cloudflare R2 is an Amazon S3 clone. The agent should be able to use this service to retrieve Cloudflare logs from this service in addition to Amazon S3

Describe a specific use case for the enhancement or feature: Cloudflare logs shipped to the Cloudflare R2 service -> Elastic Cloudflare logpush integration retrieves from R2 The agent already supports getting from non AWS buckets, but this does not seem to work with Cloudflare R2

What is the definition of done?

Logpush integration can ingest from Cloudflare R2 just like it can with Amazon S3

jamiehynds commented 1 year ago

Hey @markslott - we recently added support for 'non-AWS S3 buckets'/R2 via the 1.16 update to the Cloudflare package. Here's the PR for more info: https://github.com/elastic/integrations/pull/8278

cc @chemamartinez

elasticmachine commented 1 year ago

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

jblosser commented 1 year ago

Hi @jamiehynds, that PR was in response to @markslott putting in the original request for this on our behalf as a customer. However, it's not working for us. We configure the integration using the same keys and bucket settings we use to access the R2 bucket using the aws s3api cli, but no log data is ingested. No errors are produced that we can see, either. How can we debug this?

jamiehynds commented 1 year ago

Thanks for the additional context @jblosser. @chemamartinez do you know which set of debug logs may help to identify why the new 'non-AWS bucket' config is not ingesting via R2?

chemamartinez commented 1 year ago

Hi @jblosser, @markslott,

PR https://github.com/elastic/integrations/pull/8278 just exposed the S3 input options related to non-AWS buckets to the integration's users. Therefore, since the problem persists it should be located between the input and the Cloudflare R2 service.

It's hard to say what logs could help here because the S3 input doesn't provide debug logs at the moment, if no error appears in the logs when configuring the integration, this is probably an issue that is not handled by the input. I would start by debugging the credentials that are being used, for instance, for AWS whose accounts are protected with MFA, a session token is also needed in order to bypass MFA. Is it possible that Cloudflare R2 is also configured with MFA?

Another good test case would be create an AWS bucket for testing purposes and try to fetch Cloudflare logs from it. That way we could compare both behaviors and discard some reasons.

jblosser commented 1 year ago

The credentials we are using work fine with eg the aws cli s3api:

$ aws s3api list-objects --endpoint-url https://<accountid>.r2.cloudflarestorage.com/ --bucket <bucket> --prefix http_requests|head
{
    "Contents": [
        {
            "Key": "http_requests/date=20231018/20231018T190825Z_20231018T190831Z_ac3e2642.log.gz",
...            

The only authentication in use here are aws_access_key_id and aws_secret_access_key_id.

We're using the same in the integration config:

"cloudflare-aws-s3": {
      "enabled": true,
      "vars": {
        "collect_s3_logs": true,
        "cloudflare_r2": "<bucket>",
        "access_key_id": "XXX",
        "secret_access_key": "XXX",
        "endpoint": "https://<accountid>.r2.cloudflarestorage.com/",
        "default_region": "us-east-1",
        "fips_enabled": false
      },
...
        "cloudflare_logpush.http_request": {
          "enabled": true,
          "vars": {
            "bucket_list_prefix": "<prefix>",
            "interval": "1m",
            "number_of_workers": 5,
            "visibility_timeout": "300s",
            "api_timeout": "120s",
            "max_number_of_messages": 5,
            "file_selectors": "- regex: 'http_request/'\n",
            "tags": [
              "forwarded",
              "cloudflare_logpush-http_request"
            ],
            "preserve_original_event": false,
            "preserve_duplicate_custom_fields": false
          }
        },
...

If auth was failing, there would be a 4xx error returned somewhere. The integration doesn't capture those or make them available at all?

moonpig-vinicius-chagas commented 7 months ago

Still not working?

kOld commented 7 months ago

I've tried to setup this integration with the R2 service and I'm getting a authentication error:

elastic_agent.filebeat
[elastic_agent.filebeat][error] Input 'aws-s3' failed with: failed to initialize s3 poller: failed to get AWS region for bucket: operation error S3: GetBucketLocation, https response error StatusCode: 403, RequestID: , HostID: , api error AccessDenied: Access Denied

Changing the region causes no effect, maybe there is something incompatible with the "GetBucketLocation" api from R2?

my config is very similar to @jblosser's config

moonpig-vinicius-chagas commented 7 months ago

Have you tried to add the R2 region? @kOld

chemamartinez commented 5 months ago

I have tested the integration with Cloudflare R2 Buckets in different scenarios and it worked properly. However, I have encountered some issues during testing, all of them related to permissions and credentials, but no errors were found neither on the integration nor on the AWS input side so far.

So I have gathered all these issues and improved the integration's documentation hoping that it will help users to configure it from now on.

@markslott, @jblosser, @kOld, if you still have any issue getting it to work, please let me know.

r3naissance commented 2 months ago

@chemamartinez Wanted to share a possible update on the Cloudflare R2 Bucket documentation. Had the same issues as @jblosser except one adjustment made the difference.

Set the endpoint URL which can be found in Bucket Details. Endpoint should be a full URI, typically in the form of https(s)://<accountid>.r2.cloudflarestorage.com, that will be used as the API endpoint of the service.

What I assumed is that the integration would add a trailing forward slash onto the Endpoint URL and then use the bucket name variable in the configuration to form a full URL (S3 API URL as shown in Cloudflare R2). That's apparently not the case. There must be a trailing forward slash used in the Endpoint variable so the bucket name is included as a path and not as a TLD.

Set the endpoint URL which can be found in Bucket Details. Endpoint should be a full URI, typically in the form of https(s)://<accountid>.r2.cloudflarestorage.com/, that will be used as the API endpoint of the service.

Alternatively, the integration should check for the presence of a trailing slash and add if it's missing.

chemamartinez commented 2 months ago

@r3naissance thank you for the heads up! I will check this behaviour again and correct the documentation if necessary.