Open longern opened 5 years ago
I have such a need and so have some ready code for it.
I have a set of similar use cases here:
https://github.com/moremoban/moban/blob/dev/moban/file_system.py
where you can find:
And I would need similar functionality from os.path:
But I thought it is only me who have such a need and I am not sure if such use cases fit in with pyfs2's concept: always open parent directory, then open a file.
Not as such, but there is the open method which will split a path from the FS URL.
>>> from fs.opener import open
>>> zip_fs, path = open("zip://foo.zip!/bar/egg")
>>> zip_fs.readtext(path)
However fs.opener.open
won't work for nonexistent path.
@longern What would you expect to happen for a nonexistent path?
Some of the methods may accept a nonexistent path as the argument, such as mkdir
, exists
, and sometimes write to a new file. Is there any shortcut for them?
exists('s3://commoncrawl/robots.txt')
mkdir('ftp://some-url/some-path/dirname')
I'm not sure I follow. Are you looking for something like this?
with open_fs("s3://commoncrawl") as fs:
robots_exists = fs.exists("robots.txt")
Sometimes file URL is from user input so I need to split fs URL and path for every operation. I'm looking for some methods to directly operate file URL.
You can use this method to parse FS URLs.
@willmcgugan The documentation for ParseResult
mentions a path
part, but https://pyfilesystem2.readthedocs.io/en/latest/openers.html doesn't document how to include the path in an FS URL.
And it didn’t say how to open a file but a path.
I can make my module as an independent lib if there are enough interests.
https://github.com/moremoban/moban/blob/dev/moban/file_system.py
Or I can upstream it into PyFilesystem2 if it fits its mission.
I was looking for this, but I foudn that fs.opener.open
didn't work for a file in the current directory. It just keeps saying that the root path does not exist
.
Seems like we just have to use:
import os
(fspath, filename) = os.path.split('s3://commoncrawl/a/b/c/robots.txt')
# note that this keeps the query parameter in the filename
Not sure if query parameters matter here.
The problem is some file system abstractions like s3
and gs
use the first component of the URL as the bucket and don't expose it as part of the abstraction. It's an argument to the constructor, basically. You'd have to have file systems implement a classmethod to open an arbitrary URL to get around this.
Example? Are you saying the s3 fs impl cannot open the path including the directory?
Sorry that was a bad example
I wrote something like this:
def parse_file_url(url: str) -> Tuple[str, str]:
fs_url = ''
file_path = ''
url_parsed = urllib.parse.urlparse(url)
# if there's no scheme, it's a filesystem path
if not url_parsed.scheme:
fs_url += 'osfs://'
# if it is an absolute path, the fs_url must start at the root
if url_parsed.path.startswith('/'):
fs_url += '/'
# remove any leading slashes
file_path += url_parsed.path.lstrip('/')
if url_parsed.params:
file_path += f';{url_parsed.params}'
if url_parsed.fragment:
file_path += f'#{url_parsed.fragment}'
else:
if not url_parsed.path:
fs_url += f'{url_parsed.scheme}://'
if url_parsed.query:
fs_url += f'?{url_parsed.query}'
file_path += url_parsed.netloc
else:
fs_url += f'{url_parsed.scheme}://'
if url_parsed.netloc:
fs_url += url_parsed.netloc
if url_parsed.query:
fs_url += f'?{url_parsed.query}'
file_path += url_parsed.path
if url_parsed.params:
file_path += f';{url_parsed.params}'
if url_parsed.fragment:
file_path += f'#{url_parsed.fragment}'
return (fs_url, file_path)
@contextlib.contextmanager
def open_file_url(url: str,
mode: str = 'r',
buffering=-1,
encoding=None,
errors=None,
newline='') -> Iterator[IO]:
(fs_url, file_path) = parse_file_url(url)
with fs.open_fs(fs_url) as fs_:
with fs_.open(file_path, mode, buffering, encoding, errors,
newline) as file:
yield file
I ended up with something like this:
@contextmanager
def open_file(url: str,
mode: str = "r",
create: bool = False,
buffering: int = -1,
encoding: Optional[str] = None,
errors: Optional[str] = None,
newline: str = "",
**options) -> typing.IO:
writeable = True if "w" in mode else False
dir_url, file_name = os.path.split(url)
with open_fs(dir_url, writeable, create) as fs_:
with fs_.open(file_name, mode, buffering, encoding, errors, newline, **options) as file_:
yield file_
Is there a method that supports directly open a file URL like
smart-open
? https://pypi.org/project/smart-open/