Open BartBaddeley opened 3 years ago
Thank you for this. I think this is caused by https://github.com/Netflix/metaflow/blob/master/metaflow/datatools/s3.py#L303. I will see if the fix is simply to remove this (I need to make sure I am not missing something).
If a folder on s3 has another folder's name as the start of its name then s3.list_recursive() also lists the files in that folder:
's3://elevate-analytics-etl-output/analytics_data/analytics_data_21_05_28_1625_dedup/' 's3://elevate-analytics-etl-output/analytics_data/analytics_data_21_05_28_1625_dedup_version_2/'
The following will return all of the files in both folders.
with S3(s3root='s3://elevate-analytics-etl-output/analytics_data/analytics_data_21_05_28_1625_dedup/') as s3: for key in s3.list_recursive():
Also https://github.com/Netflix/metaflow/blob/bd585a470468741e0a74f8d285e5560dd4d1e75a/metaflow/datatools/s3.py#L427-L457"keys: (required) a list of suffixes for paths to list."
I think should be prefixes not suffixes?