BradRuderman / pyhs2

MIT License
208 stars 107 forks source link

pip install pyhs2 is not installing the latest master branch? #61

Open mkmoisen opened 8 years ago

mkmoisen commented 8 years ago

If I issue a cur.fetchmany(i), it fails with

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/apps/Python/lib/python2.7/site-packages/pyhs2/cursor.py", line 152, in fetchmany
    if size < 0 or size > MAX_BLOCK_SIZE:
NameError: global name 'MAX_BLOCK_SIZE' is not defined

I can see in this repo that the code has been changed to if size < 0 or size > self.MAX_BLOCK_SIZE:

However, the version installed by pip doesn't have the self and it is throwing this error.

Is there a mismatch between this repo and pip?

pip search pyhs2
pyhs2                     - Python Hive Server 2 Client Driver
  INSTALLED: 0.6.0 (latest)
anshanno commented 8 years ago

Have you had any luck with this? I've been experiencing the same issue and haven't been able to find anything aside from this on the subject

mkmoisen commented 8 years ago

Are you referring to the cur.fetchmany() or the pip issue?

anshanno commented 8 years ago

@mkmoisen well, they are sort of intertwined since pip is installing a slightly different version than the repo. I am getting the same error when I try to use fetchmany().

mkmoisen commented 8 years ago

@anshanno Got it. I ended up uninstalling it from pip, cloning the latest git repo and then installing it using the setup.py file.

Best regards,

Matthew Moisen

anshanno commented 8 years ago

@mkmoisen Alright, thanks. What is the best practice for using .fetchmany()?

mkmoisen commented 8 years ago

@anshanno I actually gave up on fetchmany and use fetchall instead. fetchmany appears to have an annoying bug that I've described in my pull request that you can take a look at.

This library also uses string instead of unicode, which caused me some errors with Flask/Jinja templates. I've raised another pull request for that.

My application is a simple read only app. My general flow is the following. I use a thread to obtain a connection to the Database, because I noticed that when HS2 is down, it hangs for a long time. If the thread exceed a certain time limit I'll throw an exception to fail gracefully.

def _thread_get_connection(database, ret, e):
    '''
    If HS2 is down, a connection hangs a long time before raising an exception
    Run the connection in a thread class so that it can be cut off in 5 seconds
    :param ret: a dict to hold the conn so that it can be returned
    :param e: a threading.Event()
    '''
    conn = pyhs2.connect(host=HIVE2_HOST,
                         port=HIVE2_PORT,
                         authMechanism="PLAIN",
                         user=HIVE2_USER,
                         password=HIVE2_PASSWORD,
                         database=database)

    # If the thread continues after returning an error, this will close the connection  in the event
    # that the connection actually went through
    if e.isSet():
        close_connection(conn)

    ret['conn'] = conn

def get_connection(database):
    '''
    Helper function to get a connection in a thread
    '''
    ret = {}
    e = threading.Event()
    t = threading.Thread(target=_thread_get_connection, args=(database, ret, e))
    t.start()
    t.join(5)
    if t.is_alive():
        e.set()
        app.logger.exception("Cannot connect to HS2, it must be down")
        return None

    return ret['conn']

# ... The following is wrapped in try except `pyhs2.Phys2Exception`
con = get_connection(database)
if con is None:
    raise ServerException("HS2 is down, cannot connect!")

with conn.cursor() as cur:
    cur.execute(select)
    # Get field names out of schema. Remove "table_name." if a select * was executed
    schema = cur.getSchema()
    columns = [field['columnName'][field['columnName'].index('.') + 1:]
                   if '.' in field['columnName'] else field['columnName']
                   for field in schema]
    rows = cur.fetchall()
    # Convert string to unicode
    rows = [[value.decode('utf-8') if isinstance(value, str) else u'' if value is None else value for value in row] for row in rows]
kpweiler commented 8 years ago

This is still an issue with whatever source is being used on PyPI. The git master branch works fine, but not the source on PyPI.

mkmoisen commented 8 years ago

@kpweiler Do you mean to say that this is a known issue with all repos, not just this one?