cloudyr / pyMTurkR

A Client for the MTurk Requester API
15 stars 8 forks source link

question with crowd-form element #2

Closed claravdw closed 5 years ago

claravdw commented 5 years ago

Hi,

Thanks a million for your work on this package! A real lifesaver now that the old API is officially gone.

I was wondering whether CreateHIT's question argument can deal with crowd-form elements, like the ones used in MTurk's templates for sentiment analysis, audio naturalness and more. This was working for me in MTurkR, but in pyMTurkR when I run:

q <- GenerateHTMLQuestion(file=question_file)
hit <-CreateHIT(title = "Get qualified to label radio fragments about climate change",
                 description = "Classify a political radio fragment, and get qualified for more",
                 reward = ".20",
                 annotation = "Clim Coding Step 1b",
                 assignments = 100,
                 duration = MTurkR::seconds(hours=1),
                 expiration = MTurkR::seconds(days = 7),
                 question = q$string
                )

I get the following error:

Error in parse(text = request) : <text>:11:23: unexpected symbol
10:     <crowd-classifier 
11:         categories="['Skeptical
                          ^
Warning message:
In CreateHIT(title = "Get qualified to label radio fragments about climate change",  :
  Invalid Request

To be fair, this didn't even work on the MTurkR GUI until recently: questions like this, even unmodified templates, couldn't be given layout IDs because of parsing errors... On the plus side, Amazon fixed this, so now generating a layout ID through the GUI is a viable alternative for me.

I'm including question_file, an .xml file, in attachment.

Thanks,

Clara

question_step1b.txt

claravdw commented 5 years ago

Actually, I'm not sure that the layout ID from the GUI is an alternative, after all. I need to be able to input a list of URLs and replace the ${audio_url} placeholder with it... I believe that can't be done if the question layout is passed as a layout ID rather than a xml-format string. Or am I wrong about that?

Thanks!

Clara

tylerburleigh commented 5 years ago

Can you try again using version >= 0.3.8?

I think the issue was the quotes and double-quotes weren't being escaped properly in GenerateHTMLQuestion(). I've fixed this behavior and tried it just now using your question file and it seems to work.

claravdw commented 5 years ago

Hi Tyler,

Thanks so much for looking into this. Yes, updating to 0.4.1 fixed the issue!

Clara

claravdw commented 5 years ago

Following up here since I think I have a related issue:

When I retrieve assignments for a question like this one, using GetAssignments, something seems to go wrong in parsing the answer. The Answer column of the data frame getting returned by GetAssignments contains values like this:

<?xml version="1.0" encoding="ASCII"?><QuestionFormAnswers xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2005-10-01/QuestionFormAnswers.xsd"><Answer><QuestionIdentifier>coding.label</QuestionIdentifier><FreeText>Pro-gun</FreeText></Answer></QuestionFormAnswers>

where "Pro-gun" is one the categories of the crowd-classifier crowd form element.

So the answer is there, of course, but I suppose it would be helpful if the other stuff could be stripped?

Best,

Clara

tylerburleigh commented 5 years ago

Hi @claravdw Try version >= 0.4.6

The default behavior of GetAssignment() now is to do XML parsing of the answers, converting them into a list object that can be read more naturally in R. Optionally, answers.as.separate.df can be set to TRUE and the answers will be returned as a separate data frame called Answers alongside the Assignments data frame. Try both ways and let me know how it works for you.

claravdw commented 5 years ago

Hi Tyler,

Thank you! With answers.as.separate.df set to True, it works just as you described. With it set to False, I see the assignments are retrieved successfully, but then:

Error in UseMethod("xmlSApply") : no applicable method for 'xmlSApply' applied to an object of class "list"

Best,

Clara

tylerburleigh commented 5 years ago

Thanks @claravdw. Based on your feedback and some of my own testing, I've decided the best way to return answers is as a separate data frame -- or not at all.

Please note that in version >= 0.5.2 I've changed the name of this parameter from answers.as.separate.df to get.answers (if TRUE, answers are returned as a separate dataframe; otherwise answers are not returned).

claravdw commented 5 years ago

Makes sense, thanks!