Closed xbsura closed 1 year ago
Where can we download organizations-2000000.csv?
This is a serious bug, any CSV data above 200MB with aggression can reproduce. Will dig into this soon.
Test seems to be passing with chdb 0.8.0 and libchdb 0.8.0 which includes #32 by @auxten
@xbsura could you kindly retest and confirm the latest release fixes the reported issue? Thanks for your report!
@xbsura could you kindly retest and confirm the latest release fixes the reported issue? Thanks for your report!
confirm fixed, thanks
Describe the situation import chdb res=chdb.query('select count(*) cnt from file("/Users/xbsura/Downloads/organizations-2000000.csv", CSVWithNames) group by Name order by cnt desc', 'CSV')
wc -l /Users/xbsura/Downloads/organizations-2000000.csv 2000001 /Users/xbsura/Downloads/organizations-2000000.csv
head /Users/xbsura/Downloads/organizations-2000000.csv Index,Organization Id,Name,Website,Country,Description,Founded,Industry,Number of employees 1,391dAA77fea9EC1,Daniel-Mcmahon,https://stuart-rios.biz/,Cambodia,Focused eco-centric help-desk,2013,Sports,1878 2,9FcCA4A23e6BcfA,"Mcdowell, Tate and Murray",http://jacobs.biz/,Guyana,Front-line real-time portal,2018,Legal Services,9743 3,DB23330238B7B3D,"Roberts, Carson and Trujillo",http://www.park.com/,Jordan,Innovative hybrid data-warehouse,1992,Hospitality,7537 4,bbf18835CFbEee7,"Poole, Jefferson and Merritt",http://hayden.com/,Cocos (Keeling) Islands,Extended regional Graphic Interface,1991,Food Production,9974
this sql need more than 1min to finish, and memory used is more than 100G
Which ClickHouse server version to use res = chdb.query('select version()', 'Pretty'); print(res.data()) โโโโโโโโโโโโโ โ version() โ โกโโโโโโโโโโโโฉ โ 22.12.1.1 โ โโโโโโโโโโโโโ
Queries to run that lead to slow performance select count(*) cnt from file("/Users/xbsura/Downloads/organizations-2000000.csv", CSVWithNames) group by Name order by cnt desc
Expected performance 200MB file, maybe less than 1 seconds is ok