SNIA / CDMI-spec

Specification issues and source files
Other
3 stars 1 forks source link

CDMI Extensions - S3 Exports #283

Open dslik opened 3 years ago

dslik commented 3 years ago

Extend the CDMI export functionality to allow a CDMI client to specify that a given container within a CDMI namespace should be exported (made available) as a bucket for access by the S3 cloud data access protocols. This is analogous to existing export functionality for CIFS and NFS exports.

Changes to the spec involve:

Also see proposed extension #248 which addresses a data mapping issue between buckets and CDMI's hierarchical data model

dslik commented 2 years ago

Will also have to look to see if there are any namespace issues (plus security changes)

dslik commented 8 months ago

Requirements:

Notes:

dslik commented 8 months ago

From 2024-01-26 TWG meeting:

There is also wide variability of permitted object names across different S3 implementations.

To do:

garymazz commented 8 months ago

I checked a few OSes. It seems ubuntu bash shells and python do support UNC paths using the asterisk as a wildcard. The windows cmd prompt does not like UNC

Windows cmd:

cd \\.\
'\\.\'
CMD does not support UNC paths as current directories.
dir  \\.\
The filename, directory name, or volume label syntax is incorrect.
>dir  \\.\*
The system cannot find the path specified.

Windows Powershell:

PS C:\Users\garym> dir //./
dir : Cannot find path '//./' because it does not exist.
At line:1 char:1
+ dir //./
+ ~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (//./:String) [Get-ChildItem], ItemNotFoundException
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetChildItemCommand
PS C:\Users\garym> dir //./*
Get-ChildItem : Cannot retrieve the dynamic parameters for the cmdlet. Object reference not set to an instance of an
object.
At line:1 char:1
+ dir //./*
+ ~~~~~~~~~
    + CategoryInfo          : InvalidArgument: (:) [Get-ChildItem], ParameterBindingException
    + FullyQualifiedErrorId : GetDynamicParametersException,Microsoft.PowerShell.Commands.GetChildItemCommand

Ubuntu on Windows Bash Shell:

root@EARTH:~# cd //./
root@EARTH://# cd //./
root@EARTH://# ls //./
Docker  dev   init   lib64       media  proc  sbin  sys  var
bin     etc   lib    libx32      mnt    root  snap  tmp
boot    home  lib32  lost+found  opt    run   srv   usr
root@EARTH://# ls //./* |more
//./init

//./Docker:
host

//./bin:
NF
VGAuthService
X11
....

Ubuntu 20 bash shell:

garym@pro:~$ cd //./
garym@pro://$ ^C
garym@pro:~$ ls //./
bin    core  home   lib64       media  proc  sbin  swap.img  usr
boot   dev   lib    libx32      mnt    root  snap  sys       var
cdrom  etc   lib32  lost+found  opt    run   srv   tmp
garym@pro://$ ls  //./* |more
//./core
//./swap.img
...
garym@pro://$ python
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> print (os.listdir ('//./'))
['opt', 'snap', 'lib64', 'etc', 'var', 'tmp', 'lost+found', 'boot', 'sys', 'run', 'home', 'lib', 'root', 'libx32', 'swap.img', 'srv', 'core', 'media', 'usr', 'dev', 'lib32', 'mnt', 'bin', 'proc', 'cdrom', 'sbin']
>>> arr = next(os.walk('//./'))[2]
>>> print (arr)
['swap.img', 'core']
dslik commented 8 months ago

In CDMI, a "path based namespace" is defined as:

A root path (which is by default "/"), plus "one or more container names that are separated by forward slashes (“/”) and that end with a forward slash (“/”)", plus an optional data object name, plus an optional "?" if the path is a link.

We place no restrictions except as documented in section 5.5.6, that "/" and "?" shall not be permitted in an object name.

A trailing question mark in a CDMI path refers to a link. This is stripped out by most web libraries, as RFC 3986 says this is a the separator between the path and the query parameters.

S3 does allow for object names to include "/" and "?", so we will need to define how these are mapped.

So do we need to handle file names with "?", % encoded? Percent encoding would also be used for "/", "*", etc.

From RFC 3986, section 2.2, reserved characters in URIs that are precent encoded are:

gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

sub-delims  = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

Gary M. to explore mapping requirements from S3 object names to CDMI object paths.

garymazz commented 7 months ago

It looks like the bash shell will send the '\.\' character sequence to the the file system with esc patterns. The file system does create directories.

Make Directory: garym@yocto:~$ mkdir \ garym@yocto:~$ mkdir \\ garym@yocto:~$ mkdir \\. garym@yocto:~$ mkdir \\.\

List Directory (No Path Specifiers) garym@yocto:~$ ls -al drwxr-xr-x 33 garym garym 4096 Feb 8 19:22 . drwxr-xr-x 3 root root 4096 Jul 26 2022 .. drwxrwxr-x 2 garym garym 4096 Feb 8 19:20 '\' drwxrwxr-x 2 garym garym 4096 Feb 8 19:21 '\' drwxrwxr-x 2 garym garym 4096 Feb 8 19:21 '\.' drwxrwxr-x 2 garym garym 4096 Feb 8 19:22 '\.\'

List Directory (Path Specifiers) garym@yocto:~$ ls -al \ total 8 drwxrwxr-x 2 garym garym 4096 Feb 8 19:20 . drwxr-xr-x 33 garym garym 4096 Feb 8 19:22 ..

garym@iyocto:~$ ls -al \\ total 8 drwxrwxr-x 2 garym garym 4096 Feb 8 19:21 . drwxr-xr-x 33 garym garym 4096 Feb 8 19:22 ..

garym@yocto:~$ ls -al \\. total 8 drwxrwxr-x 2 garym garym 4096 Feb 8 19:21 . drwxr-xr-x 33 garym garym 4096 Feb 8 19:22 ..

garym@yocto:~$ ls -al \\.\ total 8 drwxrwxr-x 2 garym garym 4096 Feb 8 19:22 . drwxr-xr-x 33 garym garym 4096 Feb 8 19:22 ..

List Directory (Wildcard Path Specifiers) garym@halevaiyocto:~$ ls -al \* '\': total 8 drwxrwxr-x 2 garym garym 4096 Feb 8 19:20 . drwxr-xr-x 33 garym garym 4096 Feb 8 19:22 ..

'\': total 8 drwxrwxr-x 2 garym garym 4096 Feb 8 19:21 . drwxr-xr-x 33 garym garym 4096 Feb 8 19:22 ..

'\.': total 8 drwxrwxr-x 2 garym garym 4096 Feb 8 19:21 . drwxr-xr-x 33 garym garym 4096 Feb 8 19:22 ..

'\.\': total 8 drwxrwxr-x 2 garym garym 4096 Feb 8 19:22 . drwxr-xr-x 33 garym garym 4096 Feb 8 19:22 ..

Change Directory: garym@yocto:~$ cd \\. garym@yocto:~/\.$

Create File: garym@yocto:~$ sudo touch \\ garym@yocto:~$ sudo touch \\. garym@yocto:~$ sudo touch \\.\ garym@yocto:~$ ls -l \* -rw-r--r-- 1 root root 0 Feb 8 19:46 '\' -rw-r--r-- 1 root root 0 Feb 8 19:46 '\' -rw-r--r-- 1 root root 0 Feb 8 19:46 '\.' -rw-r--r-- 1 root root 0 Feb 8 19:46 '\.\'

dslik commented 7 months ago

After a review of the above, we have determined that we will need to define a reversible mapping between the allowable S3 object naming restrictions and common file system naming restrictions. E.g. for an S3 objet named "/*?/", etc.

dslik commented 6 months ago

Proposed capabilities:

Cloud storage system­wide capabilities - Add to section 12.2.7, Table 124

"cdmi_containers" - "If present and "true", the CDMI server supports container objects".

cdmi_dataobjects_as_containers - "If present and "true", the CDMI server supports accessing data objects as container objects.

cdmi_containers_as_dataobjects - "If present and "true", the CDMI server supports accessing container objects as data objects.

Data Object Capability - Add to section 12.2.10, Table 127

cdmi_as_container - If present and “true”, this capability indicates that the CDMI server shall support the ability to access the data object as a container.

Data Object Capability - Add to section 12.2.11, Table 128

cdmi_as_dataobject - If present and “true”, this capability indicates that the CDMI server shall support the ability to access the container as a data object.

dslik commented 6 months ago

An open issue we need to discuss: S3 permits the following two separate objects to coexist in a bucket: "a", and "a/", each with a separate value. In CDMI, these would be the same object. Do we need to have a GET as container for "a" return the container object representation for "a/"?

dslik commented 6 months ago

Latest draft extension: https://github.com/SNIA/CDMI-spec/blob/main/cdmi_extensions/s3_exports/s3_exports_2.0.0.pdf

garymazz commented 5 months ago

AWS S3 Constraints

Data Model:

AWS S3 has three (3) sets of rules/constraints placed on object stores:

Bucket Naming Constraints:

Directory bucket naming Constraints

Directory bucket names must:

Object key naming Constraints

- Safe characters: The following character sets are generally safe for use in key names:

- Characters that might require special handling: The following characters in a key name might require additional code handling and likely need to be URL encoded or referenced as HEX. Some of these are non-printable characters that your browser might not handle, which also requires special handling:

Characters to avoid:

XML related object key constraints As specified by the XML standard on end-of-line handling, all XML text is normalized such that single carriage returns (ASCII code 13) and carriage returns immediately followed by a line feed (ASCII code 10) are replaced by a single line feed character. To ensure the correct parsing of object keys in XML requests, carriage returns and other special characters must be replaced with their equivalent XML entity code when they are inserted within XML tags. The following is a list of such special characters and their equivalent entity codes:

The following example illustrates the use of an XML entity code as a substitution for a carriage return. This DeleteObjects request deletes an object with the key parameter: /some/prefix/objectwith\rcarriagereturn (where the \r is the carriage return).

<Delete xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <Object>
    <Key>/some/prefix/objectwith&#13;carriagereturn</Key>
  </Object>
</Delete>