``I am reproducing the result.
For the SAM dataset download, I found this link: SAM
The dataset config file in this repo shows that it is divided into 4 subset. However, the original SAM dataset did not do that. Instead, it is just .jpg and .json file in the root folder.
Therefore, I am wondering if I should write a script to split the dataset into 4 subsets.
Here is my current code for doing that:
import os
import shutil
source_dir = 'SAM'
folders = ['0000', '0001', '0002', '0003']
for folder in folders:
os.makedirs(os.path.join(source_dir, folder), exist_ok=True)
total_pairs = 11187
pairs_per_folder = total_pairs // 4
extra_pairs = total_pairs % 4 # This will be 3 in this case
folder_indices = [0, 1, 2, 3]
folder_counts = [pairs_per_folder] * 4
for i in range(extra_pairs):
folder_counts[i] += 1
cumulative_counts = [sum(folder_counts[:i+1]) for i in range(len(folder_counts))]
for i in range(1, total_pairs + 1):
# Determine which folder the current pair belongs to
if i <= cumulative_counts[0]:
folder = folders[0]
elif i <= cumulative_counts[1]:
folder = folders[1]
elif i <= cumulative_counts[2]:
folder = folders[2]
else:
folder = folders[3]
# Move both the .jpg and .json files
for ext in ['jpg', 'json']:
filename = f'sa_{i}.{ext}'
src = os.path.join(source_dir, filename)
dst = os.path.join(source_dir, folder, filename)
if os.path.exists(src):
shutil.move(src, dst)
else:
print(f"File not found: {src}")
``I am reproducing the result. For the SAM dataset download, I found this link: SAM
The dataset config file in this repo shows that it is divided into 4 subset. However, the original SAM dataset did not do that. Instead, it is just .jpg and .json file in the root folder.
Therefore, I am wondering if I should write a script to split the dataset into 4 subsets. Here is my current code for doing that: