Scifabric / pybossa

PYBOSSA is the ultimate crowdsourcing framework (aka microtasking) to analyze or enrich data that can't be processed by machines alone.
http://pybossa.com
GNU Affero General Public License v3.0
745 stars 269 forks source link

Improve task creation and submission #60

Closed nigini closed 12 years ago

nigini commented 12 years ago

As we already have a /api/app//newtask call, we should create a /api/task//submit! This way, we can make the flow goes like:

rufuspollock commented 12 years ago

But isn't submission just a POST to /api/task/{task-id}? We're running a RESTful API right :-)?

nigini commented 12 years ago

Interesting point @rgrp . But do you agree that what we are submitting is not actually a "Task", but pieces of information that will complete a TaskRun!?

What I'm arguing about is that the client should not take responsibility to manage all "TaskRun" data. It should submit only the data user's view will manage: the answer.

I would really like to discuss this and re-start my working on that, mainly because I'm proposing this as a platform client, as I'm build two PyBossa applications!

rufuspollock commented 12 years ago

Sorry: I should have been clearer: that's a POST to /api/taskrun/ not Task. I now understand better what you are getting at so thank-you for clarifying.

I guess my question is what does this make easier compared to posting to /api/taskrun ? In posting to /api/task/{task-id}/submit the only thing different I could imagine would be you don't put the task-id in what you post because it is in the URL. However I may be missing something obvious here.

BTW: can you post a link to your app repos (and could we have them as exemplars in PyBossa organization space?) This would be really useful!

nigini commented 12 years ago

Hi @rgrp. I think you are correct about that we're not needing a new URL for submitting the TaskRuns. But I have three questions (please consider that I'm researching about them, but maybe you already know the answers):

  1. considering that the TaskRun was already created when "newtask" was called, the correct method to submit results should be PUT and not POST, right!?
  2. to use the PUT/POST, we need the TaskRun "id", but what the client has is the Task related to the TaskRun. Do I break any REST practice (given that what client called was "newtask") if we send more data to the client then the Task itself? (I don't think so.)
  3. finally, when submitting the TaskRun, client is not sending all the TaskRun data, but basically the user's answer. I can still consider that I'm not breaking any REST pattern here, right?

About the apps we're developing, I will talk with the "stakeholders" but I think it will be no problem.

rufuspollock commented 12 years ago
  1. I don't know whether TaskRun is created by newtask (I'm not sure why it should create a TaskRun ...). You are right re PUT (though we may also allow POST)
  2. This would not be an issue if you were creating a TaskRun rather than updating one ... (see prev point)
  3. You would be sending the TaskRun data which includes the user's answer ...
teleyinex commented 12 years ago

Hi everyone,

Sorry for the looong absence, but I've been very focused in developing the pybossa.js (it has its repo :+1:) and also in the geo demo :)

/newtask does only grab a question and data to load into the HTML skeleton. It does not create at all any taskrun object. The taskrun object is created when you save the answer with a POST in the taskrun URL. It gets from the presenter the TaskID and saves just the answer as @nigini has said. Thanks to this, the Task will be updated in the PyBossa end, not by JS or anything similar, as it will check the quorum:

UserA -> creates TaskRunA for Task1 for AppID = 1 UserB -> creates TaskRunB for Task1 for AppID = 1

PyBossa will check if TaskRunA and TaskRunB results are identical. If the answer is yes, then the Task1 meets the quorum, and can be validated. This means, that Task1 will not be sent again when someone requests the following url /app/ID/newtask for AppID = 1

We only should allow PUT actions in TaskRuns when the user is authenticated, but for the moment POST is fine for all the users :).

@rgrp and @nigini I'll upload next Monday my geo demo app using pybossa.js, so if you can wait I'll explain with more details what I've in mind, ok?

nigini commented 12 years ago

Hi @teleyinex.

I think I've understood your explanation about "/newtask"; but the thing is that in this issue, I'm proposing changes in the way things work. In my mind, we shouldn't expose TaskRuns because it stores sensible information about user interaction with the system (e.g. starting and finishing time). What I'm proposing is that clients should not be responsible to collect this, but the server.

In this way, my implementation creates a TaskRun for each "/newtask" call, and waits for a PUT for the created TaskRun, which will only UPDATE the already created entity with the answer.

What do you think about this!?

teleyinex commented 12 years ago

@nigini You are completely right! This is really nice as it will simplify a lot the work flow and pybossa.js. This should be done right now :) I'll try to submit the code today.

teleyinex commented 12 years ago

@nigini @rgrp I've found two problems with proposal.

Nigini's proposal requires that every user will submit the answer via a PUT call. While this works for authenticated users, it does not work for anonymous users, because PUT methods in TaskRuns can be used by authenticated users to fix an error in a TaskRun (imagine that you have a typo in your TaskRun and you want to fix it). As this is a required (I guess) step, this feature forces that anonymous users cannot updated any TaskRun, so if you are an anonymous user you cannot PUT your answer in a TaskRun created by the server.

Additionally, if the server will create with every call to /newtask a new TaskRun object, we can end with lots of TaksRuns in the server without completion unless we implement a control method. If we rely only in POST calls to create a TaskRun, we have solved this problem.

Thus, I'm proposing the following: the users (anonymous and authenticated) only have to POST the answer (what we already have), but not the sensible data. That information should be handled in the POST call by the server (I guess this is already done in the server side by the model, however in my first implementations of the demo applications I added by hand the timing for the TaskRun.created and TaskRun.finish_time). What do you think?

PS: Bear in mind that the POST will create the task with the same time stamp for the created and finish_time fields. The last one will change if the user updates (PUT) the TaskRun in the future.

rufuspollock commented 12 years ago

I'm happy for us to have this partial update method and it reduces logic in the presenters which is good (and I note presenters can still set created time if they want ...).

So final thing is some brief bullet point proposal of exact methods and changes to be done.

teleyinex commented 12 years ago

Hi everyone. As I have released yesterday the new Task Scheduler I think we can close this issue.