Open alex-ber opened 6 years ago
The workaround is to monkey patch pyhive.hive.Cursor's _fetch_more() method.
I've replaced
if not self._operationHandle.hasResultSet:
raise ProgrammingError("No result set")
with
if not self._operationHandle.hasResultSet:
if not self._operationHandle.modifiedRowCount:
self._state = self._STATE_FINISHED
return
else:
raise ProgrammingError("No result set")
def _fetch_more(self):
from pyhive.exc import ProgrammingError
"""Send another TFetchResultsReq and update state"""
assert(self._state == self._STATE_RUNNING), "Should be running when in _fetch_more"
assert(self._operationHandle is not None), "Should have an op handle in _fetch_more"
if not self._operationHandle.hasResultSet:
if not self._operationHandle.modifiedRowCount:
self._state = self._STATE_FINISHED
return
else:
raise ProgrammingError("No result set")
self._old_fetch_more()
if __name__ == "__main__":
#monkey patch
from pyhive.hive import Cursor
_fetch_more_fn = Cursor._fetch_more
Cursor._fetch_more=_fetch_more
Cursor._old_fetch_more = _fetch_more_fn
checkMultiinsert()
I am not sure whether this is correct fix or not, but it works in my case.
@alex-ber i hava same question,I get following error: `--------------------------------------------------------------------------- ProgrammingError Traceback (most recent call last)
have this issue fixed?@Cherishsword @alex-ber
As far as I know, no.
This issue is still not fixed
CompileError: The 'presto' dialect with current database version settings does not support in-place multirow inserts.
I've found 2 work-arrounds:
Note: you should compute statistic for table after such manipulation.
Note: if you have more than, say 10 rows, doing insert row by row will take like forever, so this is not really an option.
@nicholasbern I've enabled multi-row inserts for the presto, it is on master and will be available in the next release. As for hive issues, we are open for the contributions.
It works fine. Thank you @bkyryliuk
Any fix / workaround for Hive on inserting a batch of data? Other than uploading to HDFS or s3 bucket, or monkey patching the monkey patch pyhive.hive.Cursor's _fetch_more() method.
Here's a workaround - Chunking csv files using panda's dataframe.
https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-chunking https://medium.com/towards-artificial-intelligence/efficient-pandas-using-chunksize-for-large-data-sets-c66bf3037f93 https://github.com/dropbox/PyHive/issues/55
Hi! I'm using pyhive + pandas I have the same problem when I run:
data.to_sql("test", con=engine, if_exists='append', index=False)
Since the problem happens when inserting multiple rows, I tried to add method='multi'
which pass multiple values in a single INSERT clause:
data.to_sql("test", con=engine, if_exists='append', index=False, method='multi')
And it worked for me!
I use latest PyHive 0.1.8, thrift 0.11.0 and thrift-sasl 0.3.0. I use latest SQLAlchemy 1.2.12. I use pandas 0.16.0 (the problem is not in Pandas).
I am creating DataFrame with 3 rows. I want to create table if it doesn't exists and put these rows to it.
The DB schema can be defined as following:
The following code in Python 2.7. Below is simplified version of what I'm trying to achieve:
Note: I have to use Hive user in the connection URL because of bug
Note: I enable logs for SQLAlchemy just to get better understanding of what is going on, you can remove them.
I get following error:
As you can see the problem is in hive.py at Cursor._fetch_more() method
This is it's code:
The reason is in line
if not self._operationHandle.hasResultSet:
this condition holds, and ProgrammingError is thrown. The state of self._operationHandle is the following:TOperationHandle: TOperationHandle(hasResultSet=False, modifiedRowCount=None, operationType=0, operationId=THandleIdentifier(secret='\xac\x9a\x0f\xbf\x83\x87@\x86\xb8\x9e@np\xf8\xf6g', guid='\x81\x17=v.\x83G\xc1\x86U*+\xb4\xca\xa6\xdd'))
P.S. Insertion of 1 row works fine. That is, if I have DataFrame with 1 row, it works. This has to do with SQLAlchemy code (there is check if multiparam has only 1 value, than another execution path is taken, that is one that doesn't involve call to cursor's _fetch_more() method),