FluxML / FastAI.jl

Repository of best practices for deep learning in Julia, inspired by fastai
https://fluxml.ai/FastAI.jl
MIT License
585 stars 51 forks source link

the dataset is deleted right after download in Windows10 #270

Open MariusDrulea opened 1 year ago

MariusDrulea commented 1 year ago

Package Version

0.5.0

Julia Version

1.8.3

OS / Environment

Windows10

Describe the bug

I just run the following code to download the coco_sample dataset: FastAI.load(datasets()["coco_sample"]). The download is succesful. After the download 7zip is being called to unpack the archive. After the unzipping the following error occurs. It looks like the script tries to delete the folder it just created, fastai-coco_cample. This happens with all the datasets.

ERROR: LoadError: IOError: rm("D:\\z_installed_programs\\julia-depot\\datadeps\\fastai-coco_sample"): resource busy or locked (EBUSY)

Note that I have the julia's DEPOT_PATH environment variable set to D:\\z_installed_programs\\julia-depot, instead of the default home directory of the user.

Steps to Reproduce

using FastAI
FastAI.load(datasets()["coco_sample"])

Expected Results

get the coco sample dataset on the PC

Observed Results

the archive of the coco sample is downloaded, the archive is unzipped, then the error occurs and then the fastai-coco_cample folder containing the archive and the unzipped data is deleted.

Relevant log output

ERROR: LoadError: IOError: rm("D:\\z_installed_programs\\julia-depot\\datadeps\\fastai-coco_sample"): resource busy or locked (EBUSY)
Stacktrace:
  [1] uv_error
    @ .\libuv.jl:97 [inlined]
  [2] rm(path::String; force::Bool, recursive::Bool)
    @ Base.Filesystem .\file.jl:306
  [3] checkfor_mv_cp_cptree(src::String, dst::String, txt::String; force::Bool)
    @ Base.Filesystem .\file.jl:330
  [4] #mv#15
    @ .\file.jl:425 [inlined]
  [5] (::FastAI.Datasets.var"#10#11")(f::String)
    @ FastAI.Datasets D:\z_installed_programs\julia-depot\packages\FastAI\as9UG\src\datasets\fastaidatasets.jl:261
  [6] #16
    @ D:\z_installed_programs\julia-depot\packages\DataDeps\ae6dT\src\resolution_automatic.jl:122 [inlined]
  [7] cd(f::DataDeps.var"#16#17"{FastAI.Datasets.var"#10#11", String}, dir::String)
    @ Base.Filesystem .\file.jl:101
  [8] run_post_fetch(post_fetch_method::FastAI.Datasets.var"#10#11", fetched_path::String)
    @ DataDeps D:\z_installed_programs\julia-depot\packages\DataDeps\ae6dT\src\resolution_automatic.jl:119
  [9] download(datadep::DataDeps.DataDep{String, String, typeof(DataDeps.fetch_default), FastAI.Datasets.var"#10#11"}, localdir::String; remotepath::String, i_accept_the_terms_of_use::Nothing, skip_checksum::Bool)
    @ DataDeps D:\z_installed_programs\julia-depot\packages\DataDeps\ae6dT\src\resolution_automatic.jl:84
 [10] download
    @ D:\z_installed_programs\julia-depot\packages\DataDeps\ae6dT\src\resolution_automatic.jl:63 [inlined]
 [11] handle_missing
    @ D:\z_installed_programs\julia-depot\packages\DataDeps\ae6dT\src\resolution_automatic.jl:10 [inlined]
 [12] _resolve
    @ D:\z_installed_programs\julia-depot\packages\DataDeps\ae6dT\src\resolution.jl:83 [inlined]
 [13] resolve(datadep::DataDeps.DataDep{String, String, typeof(DataDeps.fetch_default), FastAI.Datasets.var"#10#11"}, inner_filepath::String, calling_filepath::String)
    @ DataDeps D:\z_installed_programs\julia-depot\packages\DataDeps\ae6dT\src\resolution.jl:29
 [14] resolve(datadep_name::String, inner_filepath::String, calling_filepath::String)
    @ DataDeps D:\z_installed_programs\julia-depot\packages\DataDeps\ae6dT\src\resolution.jl:54
 [15] resolve
    @ D:\z_installed_programs\julia-depot\packages\DataDeps\ae6dT\src\resolution.jl:73 [inlined]
 [16] makeavailable
    @ D:\z_installed_programs\julia-depot\packages\FastAI\as9UG\src\datasets\loaders.jl:46 [inlined]
 [17] loaddata(loader::FastAI.Datasets.DataDepLoader)
    @ FastAI.Datasets D:\z_installed_programs\julia-depot\packages\FastAI\as9UG\src\datasets\loaders.jl:50
 [18] (::FastAI.Registries.var"#8#13")(row::NamedTuple{(:id, :description, :size, :tags, :package, :downloaded, :loader), Tuple{String, Union{Missing, String}, Union{Missing, String}, Vector{String}, Module, Bool, FastAI.Datasets.DatasetLoader}})
    @ FastAI.Registries D:\z_installed_programs\julia-depot\packages\FastAI\as9UG\src\Registries\datasets.jl:38
 [19] load(entry::FeatureRegistries.RegistryEntry; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ FeatureRegistries D:\z_installed_programs\julia-depot\packages\FeatureRegistries\FBMLI\src\registry.jl:135
 [20] load
    @ D:\z_installed_programs\julia-depot\packages\FeatureRegistries\FBMLI\src\registry.jl:135 [inlined]
MariusDrulea commented 1 year ago

I wanted to dig a bit into the issue, but I'm facing the following (cumulative) issues:

  1. I cannot download any dataset if the proxy (default) is activated. For the work in the above comment to happen I had to deactivate the proxy. I also had to run everything as admin for the 7zip to work.

  2. datarecipes() returns nothing:

    julia> datarecipes()
    Dataset recipes
    
    ID       Block types  Description   Is downloaded  Dataset ID  Package   Recipe  
    :id      :blocks      :description  :downloaded    :datasetid  :package  :recipe 
    
    missing  missing      missing       missing        missing     missing   missing
  3. FastVision installation makes downgrades of a list of packages. I see it is set "compat" with MakieCore 0.3 for instance, while we have MakieCore 0.5.x available. Makie is a package I would like to stay up to date.

The log for download via proxy:

Do you want to download the dataset from https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-160.tgz to "D:\z_installed_programs\julia-depot\datadeps\fastai-imagenette2-160"?
[y/n]
y
ERROR: UndefVarError: file not defined
Stacktrace:
  [1] download(url::String, local_path::String, headers::Vector{Pair{SubString{String}, SubString{String}}}; update_period::Float32, kw::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:decompress,), Tuple{Bool}}})
    @ HTTP D:\z_installed_programs\julia-depot\packages\HTTP\RQd4C\src\download.jl:159
  [2] #fetch_http#26
    @ D:\z_installed_programs\julia-depot\packages\DataDeps\ae6dT\src\fetch_helpers.jl:80 [inlined]
lorenzoh commented 1 year ago

Hi Marius, thanks for reporting the issue. The downloads being removed must be an issue with the windows-specific dataset unpacking, I'll see that I can get that removed, as it existed only for backward compatibility.

I don't have a Windows machine to test on unfortunately, but will see how to fix. I'm not sure what the interaction with the proxy is, since downloading is done by DataDeps.jl.

  1. datarecipes() returns nothing:

Dataset recipes are populated by the domain packages like FastVision.jl and FastTabular.jl so you'll have to load one first to see the dataset recipes, e.g.

using FastAI, FastVision
datarecipes()

FastVision installation makes downgrades of a list of packages. I see it is set "compat" with MakieCore 0.3 for instance, while we have MakieCore 0.5.x available. Makie is a package I would like to stay up to date.

Will see to update the compat bounds 👍 , see #274