alan-turing-institute / uatk-spc

Synthetic Population Catalyst
https://alan-turing-institute.github.io/uatk-spc/
MIT License
20 stars 12 forks source link

No such file or directory (os error 2) #27

Closed nickmalleson closed 2 years ago

nickmalleson commented 2 years ago

I'm running the program on berkshire, but getting this error: Error: failed to open filedata/raw_data/nationaldata/QUANT_RAMP/retailpointsZones.csv``

And indeed that file hasn't been created:

(base) Mac-Pro:uatk-spc nick$ ls data/raw_data/nationaldata/QUANT_RAMP/
add_msoa_to_venue.py hospitalZones.csv    primaryZones.csv     secondaryProbPij.npy

Here's the full output:

(base) Mac-Pro:uatk-spc nick$ cargo run --release -- config/berkshire.txt 
    Finished release [optimized] target(s) in 0.60s
     Running `target/release/spc config/berkshire.txt`
[583.78µs] [grab_raw_data] Downloading https://ramp0storage.blob.core.windows.net/referencedata/lookUp.csv to data/raw_data/referencedata/lookUp.csv
[662.35µs] [grab_raw_data] ... file exists, skipping
[15.45ms] [grab_raw_data] From 107 MSOAs, we need 1 time use files and 1 OSM files
[15.48ms] [grab_raw_data] Downloading https://ramp0storage.blob.core.windows.net/countydata/tus_hse_berkshire.gz to data/raw_data/countydata/tus_hse_berkshire.gz
[15.51ms] [grab_raw_data] ... file exists, skipping
[15.54ms] [grab_raw_data] data/raw_data/countydata/tus_hse_berkshire.csv already exists, not untarring data/raw_data/countydata/tus_hse_berkshire.gz
[15.56ms] [grab_raw_data] Downloading http://download.geofabrik.de/europe/great-britain/england/berkshire-latest-free.shp.zip to data/raw_data/countydata/OSM/berkshire-latest-free.shp.zip
[15.58ms] [grab_raw_data] ... file exists, skipping
[15.59ms] [grab_raw_data] Unzipping data/raw_data/countydata/OSM/berkshire-latest-free.shp.zip to data/raw_data/countydata/OSM/berkshire-latest-free/...
Archive:  data/raw_data/countydata/OSM/berkshire-latest-free.shp.zip
[23.80ms] [grab_raw_data] Downloading https://ramp0storage.blob.core.windows.net/nationaldata/QUANT_RAMP_spc.tar.gz to data/raw_data/nationaldata/QUANT_RAMP_spc.tar.gz
[23.83ms] [grab_raw_data] ... file exists, skipping
[23.84ms] [grab_raw_data] data/raw_data/nationaldata/QUANT_RAMP/ already exists, not untarring data/raw_data/nationaldata/QUANT_RAMP_spc.tar.gz
[23.85ms] [grab_raw_data] Downloading https://ramp0storage.blob.core.windows.net/nationaldata/businessRegistry.csv to data/raw_data/nationaldata/businessRegistry.csv
[102.51s] [grab_raw_data] Downloading https://ramp0storage.blob.core.windows.net/nationaldata/timeAtHomeIncreaseCTY.csv to data/raw_data/nationaldata/timeAtHomeIncreaseCTY.csv
[104.27s] [grab_raw_data] Downloading https://ramp0storage.blob.core.windows.net/nationaldata/MSOAS_shp.tar.gz to data/raw_data/nationaldata/MSOAS_shp.tar.gz
[110.09s] [grab_raw_data] Untarring data/raw_data/nationaldata/MSOAS_shp.tar.gz...
[110.09s] [grab_raw_data] Extracting MSOAS_shp/, which is 0B
[110.09s] [grab_raw_data] Extracting MSOAS_shp/msoas.prj, which is 417B
[110.09s] [grab_raw_data] Extracting MSOAS_shp/.DS_Store, which is 6.00KiB
[110.09s] [grab_raw_data] Extracting MSOAS_shp/msoas.shx, which is 56.36KiB
[110.09s] [grab_raw_data] Extracting MSOAS_shp/msoas.shp, which is 15.25MiB
[110.20s] [grab_raw_data] Extracting MSOAS_shp/msoas.cpg, which is 5B
[110.21s] [grab_raw_data] Extracting MSOAS_shp/msoas.dbf, which is 1.84MiB
[110.22s] [get_info_per_msoa] Loading MSOA shapes
[110.34s] [get_info_per_msoa] Loading buildings from data/raw_data/countydata/OSM/berkshire-latest-free/
[110.55s] [get_info_per_msoa] Found 172,656 buildings from data/raw_data/countydata/OSM/berkshire-latest-free/gis_osm_buildings_a_free_1.shp
[110.55s] [get_info_per_msoa] Matching 172,656 points to 107 polygons. Building R-Tree...
[115.66s] [Creating households] Creating households (Memory usage: 441.27MiB)
[00:00:03] [################################################################################################################################################---] 357,284/363,653 (0s) [119.57s] [Creating households] 878,045 people across 363,653 households, and 107 MSOAs (Memory usage: 297.70MiB)
[119.71s] [create_commuting_flows] Finding all businesses
[121.04s] [create_commuting_flows] 364,894 jobs available among 48,245 businesses
[121.04s] [create_commuting_flows] Grouping people by SIC
[121.12s] [create_commuting_flows] If we match workers to jobs by SIC, 204,279 / 323,119 = 0.63 get a job. SIC threshold is 0
[121.12s] [create_commuting_flows] Matching 19 job markets
[132.33s] [setup_venue_flows] Reading Retail flow data...

  132.69s          initialisation "berkshire"
    110.22s          grab_raw_data 
    22.47s           creating population 
      1.35s            get_info_per_msoa 
      8.00s            read_individual_time_use_and_health_data 
        4.09s            Reading "data/raw_data/countydata/tus_hse_berkshire.csv"
        3.91s            Creating households 
      12.75s           create_commuting_flows 
      29.31ms          setup_venue_flows Retail

Error: failed to open file `data/raw_data/nationaldata/QUANT_RAMP/retailpointsZones.csv`

Caused by:
    No such file or directory (os error 2)
dabreegster commented 2 years ago

I suspect downloading or untarring the massive QUANT file failed partially last time. Per a not very visible part of the docs, you might have to remove the directory and/or tar.gz file to get it to try again.

From the logs, data/raw_data/nationaldata/QUANT_RAMP/ already exists, not untarring data/raw_data/nationaldata/QUANT_RAMP_spc.tar.gz. For me, ls -l data/raw_data/nationaldata/QUANT_RAMP_spc.tar.gz is about 2.3GB. Could you check if you have the full file?

I've thought about making the download/extract process smarter and compare expected file sizes. Maybe it's time to do that...

nickmalleson commented 2 years ago

That file is only 10M for me. Almost certaintly my fault that it was only partially downloaded, I think I killed it in a previous attempt. It's re-downloading now. I've mentioned the error explicitly int he docs in https://github.com/alan-turing-institute/uatk-spc/pull/28 (https://github.com/alan-turing-institute/uatk-spc/pull/28/commits/7c5ba7b90965aeefc7b7ad3e198e4e25867d5a6d)

dabreegster commented 2 years ago

Filed #30 to later explore more automated recovery. The tool could detect partly downloaded files and try again.