ebi-ait / hca-ebi-dev-team

Repository for hca ebi dev team agile management. See zenhub board
0 stars 0 forks source link

[bug] hca-util sync command doesn't work #233

Open mshadbolt opened 4 years ago

mshadbolt commented 4 years ago

I attempted to use the hca-util sync command to transfer files from a hca-util upload area to an ingest submission upload area.

It hung for a long time then eventually said 'transferring' Then all the transfers failed

(base) C02YV4HKLVDL:ranglr mshadbolt$ hca-util sync s3://org-hca-data-archive-upload-prod/xxxxx-xxxx-xxxx-a8b6-88ef12d2b964/ --profile wrangler
Transferring...
xxxx/H0015_LA_S1_L001_I1_001.fastq.gz  0 / 597394719  Transfer failed.
(* thousands of lines)
...
Transfer error

etc

mshadbolt commented 4 years ago

@clairerye can you please prioritise this?

If this is the only tool we have to transfer files to an upload area at the moment then it needs to be fixed as a high priority but if there is a work around we can use in the mean time that can be communicated to operations then it can wait.

clairerye commented 4 years ago

As far as I am aware this is the way we want to be doing this rather than using the old CLI. @MightyAx or @prabh-t. It looks like a bug, or at least its not performing the required and expected behaviour so I agree we need to get it fixed asap. @MightyAx are you happy to take a look at this as soon as possible?

prabh-t commented 4 years ago

I've taken a quick look and was waiting for the issue to be prioritised. Can continue to investigate further or also happy to leave it to @MightyAx to do.

clairerye commented 4 years ago

If you have already started, maybe its easier for you to carry on? It would be good to make sure that you aren't the only one familiar with it though. I will leave the decision to you @prabh-t and @MightyAx. In the meantime do you suggest @mshadbolt uses the hca cli or waits? I think she is working to a fairly tight time frame.

prabh-t commented 4 years ago

If this can wait until end of day/early morning tomorrow, i'd suggest we wait. I will have to run the sync command against this upload area to try reproducing the issue, as it seems to do with this particular dataset. I may end up transferring the data to where it needs to be in the process, if that's OK.

clairerye commented 4 years ago

@mshadbolt Is that reasonable? If you were wanting to submit this today, I think using the hca cli is your best option. I will leave you to co-ordinate with @prabh-t as my access will be limited for the rest of the day, sorry.

mshadbolt commented 4 years ago

Has anyone used the command successfully on a real dataset?

sure. I can use the hca cli.

mshadbolt commented 4 years ago

hca cli is working perfectly so I don't think there is anything wrong with the files

prabh-t commented 4 years ago

hi marion, i've identified the issue. The command takes a while initially (roughly 10 mins) as it tries to gather metadata for each object (600+) before running the sync in parallel. I've changed that to happen in the thread now so you're not blocked at the start and you see the progress/something happening. And the reason why some files were transferred others not was to do with wrong determination of content type for certain files. These are fixed now and i'm going to update the package after running the tests to be sure nothing else breaks. This is going to happen a bit later in the night, so if you want me to do the transfer, let me know.

prabh-t commented 4 years ago

v0.2.8 addresses the above issues. It also addresses issue with individual file progress with a single overall progress bar as was discussed on slack. (the upload and download command still use the old progress indication which stays problematic with large number of files.) I have added a release note to the code repo here and I have a draft PyPi package release SOP here. It'd be great to have your input on. @lauraclarke @clairerye @mshadbolt

clairerye commented 4 years ago

When you get the chance, can I have edit access please so I can make a few comments/suggestions. But this looks like a great plan overall. @ami-day when you are ready, are you able to start the testing in ticket #235

MightyAx commented 4 years ago

I believe this ticket can be moved to Done / In Production / Closed