freedomofpress / dangerzone

Take potentially dangerous PDFs, office documents, or images and convert them to safe PDFs
https://dangerzone.rocks/
GNU Affero General Public License v3.0
3.35k stars 152 forks source link

Comparing Dangerzone releases on bulk document conversion: performance, regression testing #352

Open deeplow opened 1 year ago

deeplow commented 1 year ago

We have a work in progress bulk document this. We should compare what's the difference in results from 0.4.0 to 0.4.1.

Update: We have merged the PR to make this work. We should:

  1. Run the large test set now that we have merged the PR in both supported architectures (ideally in MacOS).
  2. Commit the test results and the report in the large test set repo, as 0.4.2 results.
deeplow commented 1 year ago

I'm now in the process of running these tests. We'll have to wait for the final results to be sure, but so far it looks like replacement of PDFtk with pdftoppm (https://github.com/freedomofpress/dangerzone/pull/338) massively sped up the conversions.

sssoleileraaa commented 1 year ago

Just capturing that we have some benchmarking results from 0.4.1 testing: https://docs.google.com/spreadsheets/d/1OpKvZxWY2tn9CK2GSaJaqAZarxWDt-AMkB5X9l13ZBM/edit#gid=0

apyrgio commented 1 year ago

Moved to the 0.5.0 milestone, since the related PR will be merged there.

deeplow commented 8 months ago

I have now started the final 0.5.0 large test