Cyclododecene / newsfeed

Newsfeed based on GDELT Project
GNU General Public License v3.0
21 stars 4 forks source link

need fix #6

Closed TerenceLiu98 closed 2 years ago

TerenceLiu98 commented 2 years ago

issues:

  1. slow in download/query, for large csv file, need chunks https://github.com/Cyclododecene/GNAF/blob/1ecedb1ed170e32e345f0bf4e5b4122285fc5fc6/GNAF/news/database/events.py#L76 https://github.com/Cyclododecene/GNAF/blob/1ecedb1ed170e32e345f0bf4e5b4122285fc5fc6/GNAF/news/database/events.py#L243
  2. replace useless print with datetime https://github.com/Cyclododecene/GNAF/blob/1ecedb1ed170e32e345f0bf4e5b4122285fc5fc6/GNAF/news/apis/query.py#L126
TerenceLiu98 commented 2 years ago

for the first question, the slow is not caused by the pd.read_csv, it's caused by the multiprocessing.pool, change the cpu_num into cpu_num = multiprocessing.cpu_count() * 1) https://github.com/Cyclododecene/GNAF/blob/1ecedb1ed170e32e345f0bf4e5b4122285fc5fc6/GNAF/news/database/events.py#L26 https://github.com/Cyclododecene/GNAF/blob/1ecedb1ed170e32e345f0bf4e5b4122285fc5fc6/GNAF/news/database/events.py#L129

TerenceLiu98 commented 2 years ago

fixed in 69793efa53a9ccbcde1b214845cd7f7584933680

TerenceLiu98 commented 2 years ago

now, it should be faster than before