dicarlolab / mturkutils

High-throughput web-based human psychophysics
0 stars 4 forks source link

Weird clipping in data #16

Open ardila opened 9 years ago

ardila commented 9 years ago

@hahong @yamins81 When trying to retrieve data for one of my hits, the data is clipped.

This is not the case with any other hit. Have you guys ever seen something like this?

This should reproduce a weirdly clipped json string. (Cuts off just after the test section).

hitid = '3IV1AEQ4DRN3E2RJZMR59W421BXJ87'
import os
BOTO_CRED_FILE = os.path.expanduser('~/.boto')
from boto.pyami.config import Config
from boto.mturk.connection import MTurkConnection

section_name = 'MTurkCredentials'
MTURK_PAGE_SIZE_LIMIT = 100

def parse_credentials_file(path=None, section_name='Credentials'):
    if path is None:
        path = BOTO_CRED_FILE
    config = Config(path)
    assert config.has_section(section_name), \
        'Field ' + section_name + \
        ' not found in credentials file located at ' + path
    return config.get(section_name, 'aws_access_key_id'), \
            config.get(section_name, 'aws_secret_access_key')

access_key_id, secretkey = \
                parse_credentials_file(section_name=section_name)

conn = MTurkConnection(aws_access_key_id=access_key_id,
                                         aws_secret_access_key=secretkey)

assignments = conn.get_assignments(hit_id=hitid,
                page_size=min(self.max_assignments, MTURK_PAGE_SIZE_LIMIT))

HITdata = conn.get_hit(hit_id=hitid)

print assignments[0].answers[0][0].fields[0]

This is all the hits I ran in the group, none of which have the same problem.

[u'3TZ0XG8CBUUE8QEADA0SK2QZ56L891',
 u'3G9UA71JVV4ZEOM0PHZZVW87QRRJ7K',
## u'3IV1AEQ4DRN3E2RJZMR59W421BXJ87',
 u'3KVQ0UJWPXV6X48G8N3HM2OJ1H5W5D',
 u'32K26U12DNYOMSN4XJG4YCTX7P0VD9',
 u'3A520CCNWNA9MAY6IJ0S87X5MR1EAL',
 u'3R6RZGK0XFMRK3IVTF3IBX3YPD7YVL']
ardila commented 9 years ago

This is not consistent per hit, but the clipping seems to come out to the same length every time

524288

Switching to the response field being just an index should put me under this limit. I am not sure how to deal with this issue more generally yet. I will look into it a bit.