Open dslik opened 3 years ago
Will also have to look to see if there are any namespace issues (plus security changes)
Requirements:
Notes:
From 2024-01-26 TWG meeting:
There is also wide variability of permitted object names across different S3 implementations.
To do:
I checked a few OSes. It seems ubuntu bash shells and python do support UNC paths using the asterisk as a wildcard. The windows cmd prompt does not like UNC
Windows cmd:
cd \\.\
'\\.\'
CMD does not support UNC paths as current directories.
dir \\.\
The filename, directory name, or volume label syntax is incorrect.
>dir \\.\*
The system cannot find the path specified.
Windows Powershell:
PS C:\Users\garym> dir //./
dir : Cannot find path '//./' because it does not exist.
At line:1 char:1
+ dir //./
+ ~~~~~~~~
+ CategoryInfo : ObjectNotFound: (//./:String) [Get-ChildItem], ItemNotFoundException
+ FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetChildItemCommand
PS C:\Users\garym> dir //./*
Get-ChildItem : Cannot retrieve the dynamic parameters for the cmdlet. Object reference not set to an instance of an
object.
At line:1 char:1
+ dir //./*
+ ~~~~~~~~~
+ CategoryInfo : InvalidArgument: (:) [Get-ChildItem], ParameterBindingException
+ FullyQualifiedErrorId : GetDynamicParametersException,Microsoft.PowerShell.Commands.GetChildItemCommand
Ubuntu on Windows Bash Shell:
root@EARTH:~# cd //./
root@EARTH://# cd //./
root@EARTH://# ls //./
Docker dev init lib64 media proc sbin sys var
bin etc lib libx32 mnt root snap tmp
boot home lib32 lost+found opt run srv usr
root@EARTH://# ls //./* |more
//./init
//./Docker:
host
//./bin:
NF
VGAuthService
X11
....
Ubuntu 20 bash shell:
garym@pro:~$ cd //./
garym@pro://$ ^C
garym@pro:~$ ls //./
bin core home lib64 media proc sbin swap.img usr
boot dev lib libx32 mnt root snap sys var
cdrom etc lib32 lost+found opt run srv tmp
garym@pro://$ ls //./* |more
//./core
//./swap.img
...
garym@pro://$ python
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> print (os.listdir ('//./'))
['opt', 'snap', 'lib64', 'etc', 'var', 'tmp', 'lost+found', 'boot', 'sys', 'run', 'home', 'lib', 'root', 'libx32', 'swap.img', 'srv', 'core', 'media', 'usr', 'dev', 'lib32', 'mnt', 'bin', 'proc', 'cdrom', 'sbin']
>>> arr = next(os.walk('//./'))[2]
>>> print (arr)
['swap.img', 'core']
In CDMI, a "path based namespace" is defined as:
A root path (which is by default "/"), plus "one or more container names that are separated by forward slashes (“/”) and that end with a forward slash (“/”)", plus an optional data object name, plus an optional "?" if the path is a link.
We place no restrictions except as documented in section 5.5.6, that "/" and "?" shall not be permitted in an object name.
A trailing question mark in a CDMI path refers to a link. This is stripped out by most web libraries, as RFC 3986 says this is a the separator between the path and the query parameters.
S3 does allow for object names to include "/" and "?", so we will need to define how these are mapped.
So do we need to handle file names with "?", % encoded? Percent encoding would also be used for "/", "*", etc.
From RFC 3986, section 2.2, reserved characters in URIs that are precent encoded are:
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
Gary M. to explore mapping requirements from S3 object names to CDMI object paths.
It looks like the bash shell will send the '\.\' character sequence to the the file system with esc patterns. The file system does create directories.
Make Directory: garym@yocto:~$ mkdir \ garym@yocto:~$ mkdir \\ garym@yocto:~$ mkdir \\. garym@yocto:~$ mkdir \\.\
List Directory (No Path Specifiers) garym@yocto:~$ ls -al drwxr-xr-x 33 garym garym 4096 Feb 8 19:22 . drwxr-xr-x 3 root root 4096 Jul 26 2022 .. drwxrwxr-x 2 garym garym 4096 Feb 8 19:20 '\' drwxrwxr-x 2 garym garym 4096 Feb 8 19:21 '\' drwxrwxr-x 2 garym garym 4096 Feb 8 19:21 '\.' drwxrwxr-x 2 garym garym 4096 Feb 8 19:22 '\.\'
List Directory (Path Specifiers) garym@yocto:~$ ls -al \ total 8 drwxrwxr-x 2 garym garym 4096 Feb 8 19:20 . drwxr-xr-x 33 garym garym 4096 Feb 8 19:22 ..
garym@iyocto:~$ ls -al \\ total 8 drwxrwxr-x 2 garym garym 4096 Feb 8 19:21 . drwxr-xr-x 33 garym garym 4096 Feb 8 19:22 ..
garym@yocto:~$ ls -al \\. total 8 drwxrwxr-x 2 garym garym 4096 Feb 8 19:21 . drwxr-xr-x 33 garym garym 4096 Feb 8 19:22 ..
garym@yocto:~$ ls -al \\.\ total 8 drwxrwxr-x 2 garym garym 4096 Feb 8 19:22 . drwxr-xr-x 33 garym garym 4096 Feb 8 19:22 ..
List Directory (Wildcard Path Specifiers) garym@halevaiyocto:~$ ls -al \* '\': total 8 drwxrwxr-x 2 garym garym 4096 Feb 8 19:20 . drwxr-xr-x 33 garym garym 4096 Feb 8 19:22 ..
'\': total 8 drwxrwxr-x 2 garym garym 4096 Feb 8 19:21 . drwxr-xr-x 33 garym garym 4096 Feb 8 19:22 ..
'\.': total 8 drwxrwxr-x 2 garym garym 4096 Feb 8 19:21 . drwxr-xr-x 33 garym garym 4096 Feb 8 19:22 ..
'\.\': total 8 drwxrwxr-x 2 garym garym 4096 Feb 8 19:22 . drwxr-xr-x 33 garym garym 4096 Feb 8 19:22 ..
Change Directory: garym@yocto:~$ cd \\. garym@yocto:~/\.$
Create File: garym@yocto:~$ sudo touch \\ garym@yocto:~$ sudo touch \\. garym@yocto:~$ sudo touch \\.\ garym@yocto:~$ ls -l \* -rw-r--r-- 1 root root 0 Feb 8 19:46 '\' -rw-r--r-- 1 root root 0 Feb 8 19:46 '\' -rw-r--r-- 1 root root 0 Feb 8 19:46 '\.' -rw-r--r-- 1 root root 0 Feb 8 19:46 '\.\'
After a review of the above, we have determined that we will need to define a reversible mapping between the allowable S3 object naming restrictions and common file system naming restrictions. E.g. for an S3 objet named "/*?/", etc.
Proposed capabilities:
Cloud storage systemwide capabilities - Add to section 12.2.7, Table 124
"cdmi_containers" - "If present and "true", the CDMI server supports container objects".
cdmi_dataobjects_as_containers - "If present and "true", the CDMI server supports accessing data objects as container objects.
cdmi_containers_as_dataobjects - "If present and "true", the CDMI server supports accessing container objects as data objects.
Data Object Capability - Add to section 12.2.10, Table 127
cdmi_as_container - If present and “true”, this capability indicates that the CDMI server shall support the ability to access the data object as a container.
Data Object Capability - Add to section 12.2.11, Table 128
cdmi_as_dataobject - If present and “true”, this capability indicates that the CDMI server shall support the ability to access the container as a data object.
An open issue we need to discuss: S3 permits the following two separate objects to coexist in a bucket: "a", and "a/", each with a separate value. In CDMI, these would be the same object. Do we need to have a GET as container for "a" return the container object representation for "a/"?
Latest draft extension: https://github.com/SNIA/CDMI-spec/blob/main/cdmi_extensions/s3_exports/s3_exports_2.0.0.pdf
Data Model:
AWS S3 has three (3) sets of rules/constraints placed on object stores:
Directory bucket names must:
- Safe characters: The following character sets are generally safe for use in key names:
- Characters that might require special handling: The following characters in a key name might require additional code handling and likely need to be URL encoded or referenced as HEX. Some of these are non-printable characters that your browser might not handle, which also requires special handling:
Characters to avoid:
XML related object key constraints As specified by the XML standard on end-of-line handling, all XML text is normalized such that single carriage returns (ASCII code 13) and carriage returns immediately followed by a line feed (ASCII code 10) are replaced by a single line feed character. To ensure the correct parsing of object keys in XML requests, carriage returns and other special characters must be replaced with their equivalent XML entity code when they are inserted within XML tags. The following is a list of such special characters and their equivalent entity codes:
as >
The following example illustrates the use of an XML entity code as a substitution for a carriage return. This DeleteObjects request deletes an object with the key parameter: /some/prefix/objectwith\rcarriagereturn (where the \r is the carriage return).
<Delete xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Object>
<Key>/some/prefix/objectwith carriagereturn</Key>
</Object>
</Delete>
Extend the CDMI export functionality to allow a CDMI client to specify that a given container within a CDMI namespace should be exported (made available) as a bucket for access by the S3 cloud data access protocols. This is analogous to existing export functionality for CIFS and NFS exports.
Changes to the spec involve:
Also see proposed extension #248 which addresses a data mapping issue between buckets and CDMI's hierarchical data model