Currently the free space check includes folders that won't be part of the final bag. This means that more space needs to be available than what is actually needed.
Additionally, when specific article_ids are passed in and the article has more than 1 version, the data size is double counted
Steps To Reproduce
Set a breakpoint() in the process_articles function
Run the program with --ids 19680372
When the breakpoint triggers,
a. observe the value of self.matched_curation_folder_list and note there are two identical values (one for each version). Since the list doesn't contain the version numbers, all versions are counted on each call of get_file_size_of_given_path(path).
b. observe that the path passed to get_file_size_of_given_path includes all directories, not just UAL_RDM which is what will end up in the bag.
Is there an existing issue for this?
Description of the bug
Currently the free space check includes folders that won't be part of the final bag. This means that more space needs to be available than what is actually needed.
Additionally, when specific article_ids are passed in and the article has more than 1 version, the data size is double counted
Steps To Reproduce
process_articles
function--ids 19680372
self.matched_curation_folder_list
and note there are two identical values (one for each version). Since the list doesn't contain the version numbers, all versions are counted on each call ofget_file_size_of_given_path(path)
. b. observe that thepath
passed toget_file_size_of_given_path
includes all directories, not just UAL_RDM which is what will end up in the bag.