Closed rufuspollock closed 11 years ago
I was going to write today exactly the same proposal :-) I've an overall idea about how it will be, how to code it and how to merge it. Will we have a meeting, or could we discuss this here?
I'm going to work on this specific item today, and before submitting any code here is my design:
1.- All tasks will be exported to CSV using the keys of the info Task object field. If an item has nested data, the JSON will be dumped as it is in order to keep things as simple as possible (please, check this).
2.- The same will be done for TaskRuns, only the info field and it will be treated as a flat object. If there are any nested objects (i.e. in ForestWatchers we save the data for tasks and taskruns as GeoJSON so exporting it to CSV is really complicated and no useful for GIS applications where you will be expecting GeoJSON, GeoRSS and or KML)
3.- Include a third option where you can do the same, but exporting everything in the JSON format (we should add an option to import data also as JSON like for CSV).
New comments:
It is not possible to export Tasks + TaskRuns in the same CSV file as CSV does not support multiple sheets, so there should be an option to export all Tasks, and another one for TaskRuns.
With JSON it is possible (all_data = {'tasks': app.tasks, 'task_runs': app.task_runs), but in order to keep everything with the same structure, there should be also two options: (i) for tasks and another one (ii) for task runs
I think as a first instance exporting tasks to one csv and taskruns to another is OK. We can think later if we can merge.
Agree about taking task run info and expanding that (so each field becomes a column) and would also obviously add task_id and possibly pybossa_id
Worth linking to an exemplar google docs spreadsheet with examples of what the sheets will look like ...
We cannot merge them basically because the info fields could be very different, so the best approach if we want to give users the option to download all at once is to create a zip file like Github does with repos.
I'll add the task_id and I think that you mean pybossa_id=app.id, right?
Regarding the examples: I'll do it, now exploring the problems :-D
I've found an example from the Flask lists. I include it here for documenting purpose:
from flask import Flask, Response
import os
from werkzeug import Headers
app = Flask(__name__)
@app.route('/')
def hello_word():
def download():
fich = open("fring.avi","r+b")
while True:
data = fich.read(4096)
if not data: break
yield data
header = Headers()
header.add("Content-Type", "application/x-download")
header.add('Content-Length', str(os.path.getsize("test.zip")))
header.add("Content-Disposition", "attachment; filename=test.zip")
return Response(download(), headers=header, direct_passthrough=True)
if __name__ == "__main__":
app.debug = True
app.run()
I'd strongly prefer a non-zip approach so just having 2 separate files (just click 2 buttons). Also be nice to put this in the API (e.g. /api/1/export/task/{app_id}?format=csv
pybossa_id = the id of the task_run in pybossa
i really think doing a mock up of the export export in a gdocs spreadsheet linked from here would be really useful to clarify what we expect ...
Added to sprint 2 but as a suggestion - for discussion :-)