Open geometrikal opened 2 months ago
Hi @geometrikal ! Thanks for reporting this. We will get back to you once this is fixed.
Hi @eugen-ajechiloae-clearml i would like to work on this. Can i pick it up. I am new to the repo, can you please help with some guidance on what has to be done here.
Hi @d-vignesh ! Yes, all contributions are welcome, and no-one picked this one up as far as I'm aware.
It looks like when adding a file using add_external_files, the file name will be stripped of the first len("file://")
characters. I I believe that we don't really support paths that are not prefixed by file://
at all, so, in this function: https://github.com/allegroai/clearml/blob/b1e39e68973182d66b43edc502b58d955b5bc779/clearml/datasets/dataset.py#L3322
I would add a check: if the file is a local path without file://
then get the absolute path and add the file://
prefix.
This should be tested using absolute paths, relative paths, on both Windows and Linux/MacOS.
Hi @eugen-ajechiloae-clearml can be please clarify on below doubt,
when a source_url without file://
prefix is provided,
file://
as base url and hence the metadata is having the expected name valuelink = remote_object.get("name")
relative_path = link[len(source_url):]
if not relative_path:
relative_path = source_url.split("/")[-1]
for example, if the source_url is /Users/docs/file1.txt
, the link value will be file://Users/docs/file1.txt
. Hence the relative path will be assigned as le1.txt
in line 2. Instead if we use relative_path = source_url.split("/")[-1]
always we should be getting the correct relative path. What scenario is the line relative_path = link[len(source_url):]
handling in this case?
Hi @d-vignesh ! There are 2 cases when adding external files: you can either add a "directory" or a direct path. In case the source URL is the direct path, we fallback to relative_path = source_url.split("/")[-1]
thank you @eugen-ajechiloae-clearml , got it. Please verify my understanding, that user can give both absolute path /User/project/file1.txt
or relative path ./file1.txt
if in project directory. In relative path case we need to get the absolute path as /User/project/file1.txt
. Then finally append the file://
prefix. This modified path will then be used as the source_url.
Assuming the above understanding is right, the below logic is working for all scenarios
source_url = '../data/file2.txt'
parsed = urlparse(source_url)
if parsed.scheme == "":
abs_path = os.path.abspath(source_url)
source_url = f"file://{abs_path}"
print(source_url)
Adding this check at the beginning of the add_external_file function seems to resolve the issue. Am i missing any case?
Hi @eugen-ajechiloae-clearml , created a PR for the fix https://github.com/allegroai/clearml/pull/1326. Can you please have a look at it.
Describe the bug
Dataset.add_external_file
will allow adding external files from a source url without thefile://
prefix, e.g./path/to/my/file.csv
. It will create the link correctly in the metadat.json by prependingfile://
however it will remove characters from the file name when creating the relative link.To reproduce
Expected behaviour
Either throw an exception or correctly create the relative path
Environment