Closed TerenceLiu98 closed 2 years ago
for the first question, the slow is not caused by the pd.read_csv
, it's caused by the multiprocessing.pool
, change the cpu_num
into cpu_num = multiprocessing.cpu_count() * 1)
https://github.com/Cyclododecene/GNAF/blob/1ecedb1ed170e32e345f0bf4e5b4122285fc5fc6/GNAF/news/database/events.py#L26
https://github.com/Cyclododecene/GNAF/blob/1ecedb1ed170e32e345f0bf4e5b4122285fc5fc6/GNAF/news/database/events.py#L129
now, it should be faster than before
issues:
csv
file, need chunks https://github.com/Cyclododecene/GNAF/blob/1ecedb1ed170e32e345f0bf4e5b4122285fc5fc6/GNAF/news/database/events.py#L76 https://github.com/Cyclododecene/GNAF/blob/1ecedb1ed170e32e345f0bf4e5b4122285fc5fc6/GNAF/news/database/events.py#L243print
withdatetime
https://github.com/Cyclododecene/GNAF/blob/1ecedb1ed170e32e345f0bf4e5b4122285fc5fc6/GNAF/news/apis/query.py#L126