Scifabric / pybossa

PYBOSSA is the ultimate crowdsourcing framework (aka microtasking) to analyze or enrich data that can't be processed by machines alone.
http://pybossa.com
GNU Affero General Public License v3.0
737 stars 266 forks source link

Bulk create / import tasks system and UI #121

Closed rufuspollock closed 11 years ago

rufuspollock commented 11 years ago

Should work through user stories but key thing is an easier way to create / import tasks (e.g. via import from Google Docs Spreadsheet or online CSV file).

As the Owner of a an App I want to Create Tasks for my App in an easy easy so that there are tasks to do.

At the moment you need to write a script and upload via the API. We should provide an interface for importing tasks from the Administrator section. Obvious suggestion would be to allow people to point to either a Google Spreadsheet or a CSV file (you can get a CSV file from Google Docs so doing the latter might be the starting point).

teleyinex commented 11 years ago

Would you mind to give a bit more of information on this issue?

rufuspollock commented 11 years ago

@teleyinex I've updated this issue with much more detail.

teleyinex commented 11 years ago

@rgrp The idea of the CSV is awesome!!! Thus, I think what we can do is the following:

What do you think?

About including IDs:

I think the work flow could be the following:

This will remove the problem of using IDs for updating tasks for the moment via the CSV file. I guess that as a creator you will want to update the parameters of one task at a time. If you need a bulk action you should use the API.

Re-importing will be a mess, unless PyBossa after the creation of the tasks for the first time via a CSV file, returns the same CSV file but with the Task.id for each row (I'm assuming that each row is a task, and each column a field). In any case, the previous approach will fit well also for updating locally one task.

What do you think?

tfmorris commented 11 years ago

Would this include some type of default task presenter which gets used or is the user still required to do the HTML/JS coding for a presenter? I think it would be useful to have simple tasks entirely data driven with no coding required.

teleyinex commented 11 years ago

@tfmorris good one! I think that if we can make that kind of abstraction will be awesome, however it will be really really difficult, because every project will have a different work for presenting the data, and more importantly how to to get the data. In any case, I'll think about it, ok?

gka commented 11 years ago

+1 for Google spreadsheet import and CSV upload.

I think Google spreadsheet import is possibly easier to use for journalists.

gka commented 11 years ago

And, along with the bulk creation there should be a way to bulk delete tasks, too. Maybe this could be integrated with the bulk creation, e.g. that existing tasks can optionally be removed on bulk upload.

gka commented 11 years ago

Might come in handy:

Google spreadsheet parser of Miso.Dataset (line 70 - 122)

rufuspollock commented 11 years ago

@gka we already have one in Recline :-) https://github.com/okfn/recline/blob/master/src/backend.gdocs.js

gka commented 11 years ago

Couldn't wait for this feature, so I implemented it (as a separate app, sorry for that).

http://pybossa-spreadsheet-importer.vis4.net/

The app is small b/c it's built upon pybossa-client.

I think the spreadsheet parser might come in handy whenever this feature goes into PyBossa.

rufuspollock commented 11 years ago

@teleyinex would you be happy for @nigelbabu to start looking at this issue?

teleyinex commented 11 years ago

Sure!!! Please go ahead!!!! :-) If you need any help or comments, please contact me (I'm in a hackfest during this week, so I may not answer as fast as I should, sorry)

nigelbabu commented 11 years ago

I have a first cut for this here. Review/feedback appreciated!

teleyinex commented 11 years ago

@nigelbabu looks good! Two comments:

What do you think? @gka how did you implemented this?

nigelbabu commented 11 years ago

I won't check it in without unit tests :-)

Well, what I'm doing is, if the fields in the CSV match field names in our DB, I put it into the right db field. If not, they all go into the info field. That seemed future-proof.

teleyinex commented 11 years ago

:+1:

nigelbabu commented 11 years ago

I've landed a bulk importer than imports from CSV (https://github.com/PyBossa/pybossa/commit/4875780ecb3e2e479d2eb5542e7228d1c11d4ecc). I'm not sure if that alone is enough to consider this issue as fixed.

teleyinex commented 11 years ago

To me yes :-)

gka commented 11 years ago

I'd say no, since something like a bulk import from a Google Spreadsheet would be more convenient for journalists.

Think about rapid crowdsourcing scenario within a group of 10 journalists sitting in the same newsroom. Collecting the data in a spreadsheet, importing into PyBossa using a single click, putting together a simple HTML form to present the tasks. The entire process could be done in 5 minutes.

Exporting to CSV is just one more step along that way, and given the messiness of that standard (no metadata about encoding, separation chars etc) it could become a somewhat complicated step, too.

Also something to think about: Importing from a Google spreadsheet would be even more simple if we would provide a simple template. Timeline.js is a great example to consider.

Finally, I really think that PyBossa is a valuable tool for journalists. That said, the UI should be as simple as possible, removing every unnecessary step. Then it can be effectively used in ddj trainings. Journos will love to see crowdsourcing in 5 minutes, and will use PyBossa right away.

Am 11.10.2012 um 14:49 schrieb Daniel Lombraña González notifications@github.com:

To me yes :-)

— Reply to this email directly or view it on GitHub.

teleyinex commented 11 years ago

@gka thanks a lot for your feedback!

The truth is that the code from @nigelbabu works directly from Google Spreadsheets. You only have to make it public and select the CSV option that Google offers so I think this is more or less what you have described, but without the template approach. @nigelbabu do you think it will be possible to support the Google Spreadsheet template? That will be awesome, and as @gka has said it will simplify it. Maybe we should open another issue for this specific item.

In any I case I agree that we should provide the easiest approach for everyone :-D

nigelbabu commented 11 years ago

The current bulk importer does import from a spreadsheet as Daniel says. I need to add a help text on how to do that when you have a google spreadsheet. I've been testing the whole thing with a google spreadsheet, so it does import from a google spreadsheet.

I think I'll sit down and write documentation, maybe, that'll help me explain things better :)

nigelbabu commented 11 years ago

I added documentation for the bulk importer. @gka can you take a look? Let's have a chat about how I could make the bulk importer more helpful.

teleyinex commented 11 years ago

@nigelbabu perfect! Thanks for adding the docs, that will keep the project updated while new features come in :-) :beers:

nigelbabu commented 11 years ago

Closing this issue for now. Please open new issues for added features for the bulk importer!