Closed nuarc closed 4 years ago
@richardwu @laferrieren @Ihabilyas @thodrek @zaqthss - can you pls help on above pls ?
@nuarc the current code is designed to work on a large memory machine and preferably a large memory GPU. A 32GB machine might not be adequate.
@thodrek - thanks for responding to my query. I have actually tried executing it over 732 GB machine..
Hi, I am facing few issues in executing holoclean over Chicago Food inspections dataset as described in paper. However, it turned out to be a machine with 32GB RAM, 100 GB SSD - is unable to process, noticing memory leaks. Also, query results
14:12:20 - [ERROR] - generating aux_table pos_values Traceback (most recent call last): File "/home/ubuntu/hc36/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1236, in _execute_context cursor, statement, parameters, context File "/home/ubuntu/hc36/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 536, in do_execute cursor.execute(statement, parameters) psycopg2.DatabaseError: out of memory for query result
Since above data frame was not used in subsequent operations, i moved forwarded by simply bypassing it for this code block, however after 4 hours of execution , i am now facing an issue where log just shows killed.![image](https://user-images.githubusercontent.com/3977272/64341699-8a4a0a00-d006-11e9-8706-025c755c70cf.png)
21:09:36 - [DEBUG] - Time to execute query: 0.00 secs 21:09:36 - [DEBUG] - featurizing training data... 21:09:43 - [DEBUG] - Time to execute query: 4.46 secs Killed
It seems tensor.cat is blowing memory here... As shown below: running a sample for hospital and performing memory analysis, with little data. It turned out , the tensors footprint is 228 MB which grows further when combining all tensors. to 1.4GB. This is not scaling well, if we need to process of larger dataset of 2-4GB in a day.. Any suggestions ?Also, we noticed archived version was compatible with spark, is there any specific reason to move away from spark. Since we are planning to use spark to manage huge datasets with holoclean. Any suggestions ?