amcpherson / remixt

Clone-specific genomic structure estimation in cancer
MIT License
6 stars 3 forks source link

Problem downloading reference data #31

Open emsisson opened 1 year ago

emsisson commented 1 year ago

I have installed remixt and started downloading the reference data, but that process failed with message

   --2023-10-11 06:02:33--  http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/1kGP_high_coverage_Illumina.chrX.filtered.SNV_INDEL_SV_phased_panel.vcf.gz
   Resolving ftp.1000genomes.ebi.ac.uk (ftp.1000genomes.ebi.ac.uk)... 193.62.193.167
   Connecting to ftp.1000genomes.ebi.ac.uk (ftp.1000genomes.ebi.ac.uk)|193.62.193.167|:80... connected.
   HTTP request sent, awaiting response... 404 Not Found
   2023-10-11 06:05:37 ERROR 404: Not Found.

Using a web browser, I visited site http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV and saw that file

1kGP_high_coverage_Illumina.chrX.filtered.SNV_INDEL_SV_phased_panel.vcf.gz

is not there; instead, I see a similarly-named file

1kGP_high_coverage_Illumina.chrX.filtered.SNV_INDEL_SV_phased_panel.v2.vcf.gz

How should I address this situation?

Can I restart the download process from the point where it halted without having to download the files that I have already?

Regards, Eric Sisson

tmrnov commented 6 months ago

Hi. I have the same problem. @emsisson did you resolve it?thanks

emsisson commented 6 months ago

No, I did not. I set the matter aside to attend to other work and had forgotten about it until your message. Looking at https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/ I see that the situation is unchanged from six months ago. I guess that is why you hit the problem, also.

um-pdavila commented 2 months ago

Hi everyone, We got passed this error by updating the grch38_1kg_X_vcf_url variable in remixt/defaults.py line 67 to:

grch38_1kg_X_vcf_url                        = 'http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/1kGP_high_coverage_Illumina.chrX.filtered.SNV_INDEL_SV_phased_panel.v2.vcf.gz'