UAL-RE / ReBACH

Python-based tool to enable data preservation to a cloud-hosted storage solution
MIT License
2 stars 2 forks source link

Bug: space check includes folders that won't be bagged and also double counts in cases #96

Closed zoidy closed 4 months ago

zoidy commented 4 months ago

Is there an existing issue for this?

Description of the bug

Currently the free space check includes folders that won't be part of the final bag. This means that more space needs to be available than what is actually needed.

Additionally, when specific article_ids are passed in and the article has more than 1 version, the data size is double counted

Steps To Reproduce

  1. Set a breakpoint() in the process_articles function
  2. Run the program with --ids 19680372
  3. When the breakpoint triggers, a. observe the value of self.matched_curation_folder_list and note there are two identical values (one for each version). Since the list doesn't contain the version numbers, all versions are counted on each call of get_file_size_of_given_path(path). b. observe that the path passed to get_file_size_of_given_path includes all directories, not just UAL_RDM which is what will end up in the bag.