dssg / matching-tool

Integrating HMIS and criminal-justice data
Other
7 stars 7 forks source link

Improve memory usage of validator #268

Closed thcrock closed 6 years ago

thcrock commented 6 years ago

Two things are making the validator blow up on the super big files:

  1. The duplicate check. Right now is a goodtables check, but should be reimplemented in SQL, by loading the file during the validation phase.
  2. Large error lists are used as the asynchronous job result, which may be too big for Redis. These should probably be stored on s3 by the webapp worker instead, with the S3 path being passed as the job result.