Closed fangpingmu closed 3 years ago
@fangpingmu What type of file system is the current working directory on when running this? An NFS share or local file system? Any other relevant info such as OS version, etc. would be helpful.
They are NFS file systems. We have multiple NFS file systems, and I have tested on BeeGFS and ZFS. These NFS file systems do not support hard link across directories.
I believe that cloud AWS or GCP object storage does not support soft link. I also try to change the hardlink under hca/dss/util/init.py to softlink, and it does not work.
The only temporary solution is to copy files from .hca to the target directory. Then I delete the .hca folder.
~/.local/lib/python3.7/site-packages/hca/dss/util/init.py
def hardlink(source, link_name):
"""
Create a hardlink in a thread safe way, and revert to copying if the link
limit for the file is reached
"""
try:
os.link(source, link_name)
except FileExistsError:
# It's possible that the user created a different file with the same name as the
# one we're trying to download. Thus we need to check the if the inode is different
# and raise an error in this case.
source_stat = os.stat(source)
dest_stat = os.stat(link_name)
# Check device first because different drives can have the same inode number
if source_stat.st_dev != dest_stat.st_dev or source_stat.st_ino != dest_stat.st_ino:
raise
except OSError as e:
if e.errno == errno.EMLINK:
# FIXME: Copying is not space efficient; see https://github.com/HumanCellAtlas/dcp-cli/issues/453
log.warning('Failed to link source `%s` to destination `%s`; reverting to copying', source, link_name)
shutil.copyfile(source, link_name)
else:
log.warning('Failed to link source `%s` to destination `%s`; reverting to copying', source, link_name)
shutil.copyfile(source, link_name)
# raise
We should include errno 1 in the set of errnos for which we fall back to copying.
I'm hesitant to fall back to copying on every failure but I may be convinced otherwise. I think there is a class of problems that are intermittent or easily resolved for which we want to actually raise the problem so the user can fix it.
Release together with fix for #515, then point to release, demo and close.
No further DCP CLI releases planned. We will not be able to demo this. Closing.
I am using hca dss to download data. hca dss download-manifest --manifest 1M_Neurons.tsv \ --replica 'aws' \ --layout bundle
There are lots of hard link problems.
File "/home/user001/.local/lib/python3.7/site-packages/hca/dss/init.py", line 584, in _download_and_link_to_filestore hardlink(file_store_path, file_path) File "/home/user001/.local/lib/python3.7/site-packages/hca/dss/util/init.py", line 50, in hardlink os.link(source, link_name) PermissionError: [Errno 1] Operation not permitted: '.hca/v2/files_2_4/8f/aec6/8faec66817969ae6f847b0c649e7328af6085d88e71b32b0e3a8284df4cd88f7' -> '33855cf6-6f3e-4b8f-9cb4-c2b2ea9f528d.2019-05-16T211813.099000Z/dissociation_protocol_0.json'
My file systems do not support hard link across directories. Is there an option to use soft link?
Why is the data downloaded to .hca, and then link to the target directory using hardlink? Can hca download directly to the target directory or move the files from .hca to the target directory?