boostcampaitech3 / final-project-level3-cv-16

👀 너의 알약이 보여 💊 : 알약 이미지 분류 프로젝트
5 stars 6 forks source link

[Enhancement] Automate Data Download #2

Closed yehyunsuh closed 2 years ago

yehyunsuh commented 2 years ago

What

Downloading Data

Why

Automate downloading data process

How

  1. Download .xls file OpenData_PotOpenTabletIdntfc20220412.xls
  2. conda install openpyxl
  3. Create /data directory
  4. python data_preprocessing.py

data_preprocessing.py

import os, time
import pandas as pd
import urllib.request as req
from tqdm import tqdm

filename = 'OpenData_PotOpenTabletIdntfc20220412.xls'

df = pd.read_excel(filename, engine='openpyxl')
data_dir = "/opt/ml/final_project/data"  ## data download directory
start = time.time()

for idx in tqdm(range(len(df))):
    image_key = list(df['품목일련번호'])[idx]
    image_url = list(df['큰제품이미지'])[idx]
    downloaded_file = req.urlretrieve(image_url, f"{data_dir}/{image_key}.jpg")
print(time.time()-start)

ETA: over 100 mins

seoulsky-field commented 2 years ago

성공적으로 진행되고 있습니다! 감사합니다.

yehyunsuh commented 2 years ago

Issue closed. Pull Request will be done after mentoring.