Closed ManishSahu53 closed 3 years ago
CPU times: user 1min 15s, sys: 14.1 s, total: 1min 30s Wall time: 1min 40s
We can use multithreads too to increase speed.
Thanks! Can you think of a way to do this without importing csv? ;)
Yes, I use csv module so that if there is text column and contains "," then my alignment wont be disturbed. CPU times: user 7.59 s, sys: 4.16 s, total: 11.8 s Wall time: 28.9 s
%%time
path_data = 'csv_test/data/'
# Writing data to CSV
def writedata(csvobject, row):
"""
Write data to CSV
"""
csvobject.write(row)
# To get all CSV files
path_csv = []
for root, dirs, files in os.walk(path_data):
for file in files:
if 'csv' in file[-3:]:
path_csv.append(os.path.join(root, file))
## Creating merged file
path_merge = open( 'merged_no_csv.csv' , 'a')
## Reading and Writing header columns
with open(path_csv[0]) as f:
for line in f:
writedata(path_merge, line)
break
# Reading and Writing data to merged CSV
for temp_path_csv in path_csv:
index = 0
with open(temp_path_csv) as f:
for line in f:
index +=1
# Skip Header FIle
if index == 1:
continue
writedata(path_merge, line)
import os
from datetime import datetime
t1 = datetime.now()
data_dir = './data'
csvs = os.listdir(data_dir)
# get the header
with open(os.path.join(data_dir, csvs[0])) as f:
data = f.readlines()
headers = [data[0]]
# append each rows from every csv into the header list
for csv_file_name in csvs:
with open(os.path.join(data_dir, csv_file_name)) as f:
data = f.readlines()
headers.extend(data[1:])
# save the header list as .csv file
with open('./output/final.csv', 'w') as f:
f.write(''.join(headers))
t2 = datetime.now()
t3 = t2 - t1
seconds = t3.total_seconds()
print(f'{seconds}: total seconds\n')
hours = seconds // 3600
minutes = (seconds % 3600) // 60
seconds = seconds % 60
print(f'{hours}: hour\t{minutes}: minutes\t{seconds}: seconds')
Takes around 1 min. 12 secs.
@ashishu007,
I think You can have out of memory error because of headers
variable.
@ManishSahu53, probably yes. I used a 16GB RAM system - so was smooth for me. Thanks for pointing out :)
I added a couple of solutions to issues: one pandas based and one in pure python. :) Please take a look.