basho / riak_cs

Riak CS is simple, available cloud storage built on Riak.
http://docs.basho.com/riakcs/latest/
Apache License 2.0
566 stars 95 forks source link

User statistics coming back differently via s3curl vs. s3cmd #810

Open hectcastro opened 10 years ago

hectcastro commented 10 years ago

There may be two separate issues here:

  1. Riak CS 1.4.3 does not appear to be generating access statistics that 1.4.4 is (this may have been an issue that was resolved in 1.4.4)
  2. User statistics are coming back differently via s3curl vs. s3cmd

    Steps to reproduce

Create a bucket and upload a file:

$ sudo riak-cs version
1.4.3
$ s3cmd mb s3://joker
Bucket 's3://joker/' created
$ s3cmd put ~/Downloads/Erasure\ Codes\ for\ Large\ Scale\ Distributed\ Storage\ by\ Prof\ Alex\ Dimakis\ \(Univ.\ of\ Texas,\ Austin\).mp4 s3://joker
/Users/hector/Downloads/Erasure Codes for Large Scale Distributed Storage by Prof Alex Dimakis (Univ. of Texas, Austin).mp4 -> s3://joker/Erasure Codes for Large Scale Distributed Storage by Prof Alex Dimakis (Univ. of Texas, Austin).mp4  [1 of 1]
 281262010 of 281262010   100% in   16s    16.18 MB/s  done

Flush access and storage statistics:

$ sudo riak-cs-access flush
Adding current log to archive queue...
Waiting for archiver to finish...
0 more archives to flush
All access logs were flushed.
$ sudo riak-cs-storage batch
Batch storage calculation started.

Try to get user statistics via s3curl:

$ ./s3curl.pl --id 'W7VSIEKGB6OBMNYLMR7B' --key 'Ay3G8pNm7nBRennBQWCa6IIALr3Mq1_qhDBhNw==' --contentType application/json -- -s --proxy1.0 localhost:8080 'http://s3.amazonaws.com/riak-cs/usage/W7VSIEKGB6OBMNYLMR7B?a=1&s=20140201T000000Z&e=20140228T000000Z' | jsonpp
{
  "Access": "not_requested",
  "Storage": "not_requested"
}
$ ./s3curl.pl --id 'W7VSIEKGB6OBMNYLMR7B' --key 'Ay3G8pNm7nBRennBQWCa6IIALr3Mq1_qhDBhNw==' --contentType application/json -- -s --proxy1.0 localhost:8080 'http://s3.amazonaws.com/riak-cs/usage/W7VSIEKGB6OBMNYLMR7B?a=1' | jsonpp
{
  "Access": "not_requested",
  "Storage": "not_requested"
}
$ ./s3curl.pl --id 'W7VSIEKGB6OBMNYLMR7B' --key 'Ay3G8pNm7nBRennBQWCa6IIALr3Mq1_qhDBhNw==' --contentType application/json -- -s --proxy1.0 localhost:8080 'http://s3.amazonaws.com/riak-cs/usage/W7VSIEKGB6OBMNYLMR7B?a' | jsonpp
{
  "Access": "not_requested",
  "Storage": "not_requested"
}

Try to get user statistics via s3cmd:

$ s3cmd get s3://riak-cs/usage/W7VSIEKGB6OBMNYLMR7B.abj.20140201T000000Z.20140228T000000Z --force
s3://riak-cs/usage/W7VSIEKGB6OBMNYLMR7B.abj.20140201T000000Z.20140228T000000Z -> ./W7VSIEKGB6OBMNYLMR7B.abj.20140201T000000Z.20140228T000000Z  [1 of 1]
 173 of 173   100% in    0s   557.98 B/s  done
$ cat W7VSIEKGB6OBMNYLMR7B.abj.20140201T000000Z.20140228T000000Z| jsonpp
{
  "Access": {
    "Nodes": [],
    "Errors": []
  },
  "Storage": {
    "Samples": [
      {
        "StartTime": "20140214T161326Z",
        "EndTime": "20140214T161326Z",
        "joker": {
          "Objects": 1,
          "Bytes": 281262010
        }
      }
    ],
    "Errors": []
  }
}%

Upgrade Riak CS in-place to 1.4.4:

$ sudo apt-get install riak-cs
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages will be upgraded:
  riak-cs
1 upgraded, 0 newly installed, 0 to remove and 54 not upgraded.
Need to get 0 B/22.8 MB of archives.
After this operation, 5,120 B of additional disk space will be used.
(Reading database ... 73169 files and directories currently installed.)
Preparing to replace riak-cs 1.4.3-1 (using .../riak-cs_1.4.4-1_amd64.deb) ...
Unpacking replacement riak-cs ...
Processing triggers for ureadahead ...
Processing triggers for man-db ...
Setting up riak-cs (1.4.4-1) ...
$ sudo riak-cs restart
ok

Re-upload the same file as before:

$ sudo riak-cs version
1.4.4
$ s3cmd put ~/Downloads/Erasure\ Codes\ for\ Large\ Scale\ Distributed\ Storage\ by\ Prof\ Alex\ Dimakis\ \(Univ.\ of\ Texas,\ Austin\).mp4 s3://joker
/Users/hector/Downloads/Erasure Codes for Large Scale Distributed Storage by Prof Alex Dimakis (Univ. of Texas, Austin).mp4 -> s3://joker/Erasure Codes for Large Scale Distributed Storage by Prof Alex Dimakis (Univ. of Texas, Austin).mp4  [1 of 1]
 281262010 of 281262010   100% in   18s    14.48 MB/s  done

Flush access and storage statistics:

$ sudo riak-cs-access flush
Adding current log to archive queue...
Waiting for archiver to finish...
Currently archiving 20140214T162200Z-20140214T162310Z
0 more archives to flush
All access logs were flushed.
$ sudo riak-cs-storage batch
Batch storage calculation started.

Try to get user statistics via s3curl:

$ ./s3curl.pl --id 'W7VSIEKGB6OBMNYLMR7B' --key 'Ay3G8pNm7nBRennBQWCa6IIALr3Mq1_qhDBhNw==' --contentType application/json -- -s --proxy1.0 localhost:8080 'http://s3.amazonaws.com/riak-cs/usage/W7VSIEKGB6OBMNYLMR7B?a' | jsonpp
{
  "Access": "not_requested",
  "Storage": "not_requested"
}%

Entry in Riak CS access log after this:

10.0.2.2 - - [14/Feb/2014:16:26:33 +0000] "GET /riak-cs/usage/W7VSIEKGB6OBMNYLMR7B HTTP/1.1" 200 52 "" "curl/7.30.0"

Try to get user statistics via s3cmd:

$ s3cmd get s3://riak-cs/usage/W7VSIEKGB6OBMNYLMR7B.abj.20140201T000000Z.20140228T000000Z --force
s3://riak-cs/usage/W7VSIEKGB6OBMNYLMR7B.abj.20140201T000000Z.20140228T000000Z -> ./W7VSIEKGB6OBMNYLMR7B.abj.20140201T000000Z.20140228T000000Z  [1 of 1]
 320 of 320   100% in    0s  1047.39 B/s  done
$ cat W7VSIEKGB6OBMNYLMR7B.abj.20140201T000000Z.20140228T000000Z| jsonpp
{
  "Access": {
    "Nodes": [
      {
        "Node": "riak-cs@33.33.33.10",
        "Samples": [
          {
            "StartTime": "20140214T162200Z",
            "EndTime": "20140214T162310Z",
            "KeyWrite": {
              "BytesIn": 281262010,
              "Count": 1
            }
          }
        ]
      }
    ],
    "Errors": []
  },
  "Storage": {
    "Samples": [
      {
        "StartTime": "20140214T161326Z",
        "EndTime": "20140214T161326Z",
        "joker": {
          "Objects": 1,
          "Bytes": 281262010
        }
      }
    ],
    "Errors": []
  }
}%

Entry in Riak CS access log after this:

10.0.2.2 - - [14/Feb/2014:16:23:49 +0000] "GET /riak-cs/usage/W7VSIEKGB6OBMNYLMR7B.abj.20140201T000000Z.20140228T000000Z HTTP/1.1" 200 320 "" ""
Kdecherf commented 10 years ago

Query Params seems to be ignored by Riak CS but we can still use the complete path in s3curl like this:

$ ./s3curl.pl -id identity "http://riak-cs.target.riak/usage/OD4YKDSYARGTNFXPFPQ2/abj/20140101T000000Z/20140131T235959Z" | python -m json.tool
{
    "Access": {
        "Errors": [],
        "Nodes": []
    },
    "Storage": {
        "Errors": [],
        "Samples": [
            {
                "EndTime": "20140123T162628Z",
                "StartTime": "20140123T162627Z",
                "kdecherf": {
                    "Bytes": 10169549,
                    "Objects": 442
                }
            }
        ]
    }
}
ksauzz commented 10 years ago

note: Looks riak_cs_s3_rewrite:rewrite_path/4 should pass a query string on usage request.