Closed chmreid closed 4 years ago
The fix here is to not raise this exception if the file is already linked in the filestore.
My previous comment https://github.com/humancellatlas/dcp-cli/issues/485#issuecomment-555750738 is not actually relevant. This issue has the same underlying cause as #450. Therefore it is being resolved in PR #477.
@chmreid your steps to reproduce are not completely correct. running the script once is not sufficient to manifest this bug. Instead you have to run the script twice in a row. The second time you should expect to see these errors.
The cause is that if multiple threads are downloading the same file at the same time, the last thread to finish will overwrite the filestore entry, thus orphaning the links made into the previous entry by the treads that finish first. When the script is run a second time, it encounters these orphaned files, sees that they are not linked in the filestore, assumes that the user created them and fails so as to avoid overwriting a user created file.
tl;dr: closing in favor of #450
@hannes-ucsc asked I reopen this so that you can track the progress of your issue.
was this fixed? there is no mention of a fix in #450
it appears to be fixed, when I try to reproduce the error I only see info messages like INFO:hca:Skipping download of 'process_2.json' because it already exists at '.hca/v2/files_2_4/27/86bd/2786bdd0fa9ff3607184dae6e1340cb6073dd10aa599455791e5e752a8221506'.
without the corresponding warning + traceback.
Yes. I should have mentioned this ticket in the PR, but I forgot. My comment above still stands though, #450 still fixes the issue. The fix is in the 7.0.0 release which is why we closed the issue.
I am running the
hca download-manifest
command with a manifest and the--no-data
flag. Pretty early on in the process, the hca utility begins to throw many, manyFileExistsError
exceptions, apparently because it is attempting to write and re-write and re-re-write the same JSON metadata files repeatedly.Here is the command I am using to download the manifest:
using pancreas-female.tsv.zip and shortening it via
This begins the download process, but at some point begins raising many
FileExistsError
s like so:Happens with files with names ilke
sequencing_protocol_0.json
,library_preparation_protocol_0.json
,project_19.json
,analysis_file_30.json
, etc.While I haven't narrowed down the cause of the issue, I believe it is because the hard links that the dcp-cli is creating are links to files with the same names, and the links are all being put into the same folder, so there are naming conflicts, causing the FileExistsError exceptions.
To reproduce: