akvo / akvo-flow

A data collection and monitoring tool that works anywhere.
http://akvo.org/products/akvoflow/
GNU Affero General Public License v3.0
65 stars 31 forks source link

Bulk upload code doesn't handle correctly the sharded structure of surveyal folder #430

Closed iperdomo closed 8 years ago

iperdomo commented 10 years ago

The bulk-upload process tries to process all the zip files in a folder (and subfolders) searching for duplicates and generating only one wfpGeneratedxxxx.zip.

The current process can't handle the structure of the typical surveyal folder, and generates one wfpGeneratedxxx.zip per zip found.

Possible solutions:

mtwestra commented 10 years ago

A typical surveyal folder will contain many duplications, possibly dozens for every surveyInstance. I think the dedupe should therefore happen in the upload process, to avoid uncessary stressing the backend. The main reason I guess is that the backend might spawn different processes to handle a large number of zip files, recreating possible problems

On Nov 7, 2013, at 08:22, Iván Perdomo notifications@github.com wrote:

The bulk-upload process tries to process all the zip files in a folder (and subfolders) searching for duplicates and generating only one wfpGeneratedxxxx.zip.

The current process can't handle the structure of the typical surveyal folder, and generates one wfpGeneratedxxx.zip per zip found.

Possible solutions:

Keep the current behavior, but instead of generating a wfpGenerated, just upload the zip file, and leave the duplication detection in the backend Fix the pre-processing part, searching recursively across all subfolders — Reply to this email directly or view it on GitHub.

iperdomo commented 10 years ago

@mtwestra OK, agreed. The 2nd option is the designed and intended one.

ichinaski commented 10 years ago

Regarding the duplications, we should consider avoiding unnecessary exports as specified in https://github.com/akvo/akvo-flow-mobile/issues/39

janagombitova commented 8 years ago

Recreated this issue in Flow services https://github.com/akvo/akvo-flow-services/issues/146