esmero / ami

Archipelago Multi Importer. A module of mass ingest made for the masses
GNU Affero General Public License v3.0
2 stars 4 forks source link

Remote File fetcher and 401/404 HTML body responses #193

Closed DiegoPino closed 4 months ago

DiegoPino commented 5 months ago

What?

Bug. If you fetch remotely Files via AMI and those URLs are e.g protected. And the you figure it out, you open the remotes, run AMI again, even with the re-download option during processing we end re-using the already downloaded files (HTML.. all wrong). Why? Bc the file persister actually fetches/downloads the Bodies BUT delivers an HTML... (the 401 controller). They end in the composter but until the composter run, we will keep re-using the already saved one.

Das, meine Damen und Herren, ist die definition von einem 🐞

@alliomeria will fix so we never have to deal with this again. Pull coming tomorrow AM.

DiegoPino commented 5 months ago

The problem happens here. https://github.com/esmero/ami/blob/a9a167c37e24d81ff7c9f7fe9bb180b9b3b1da55/src/AmiUtilityService.php#L498 the combined action of a sink (which means will download a file/body) with a 401/404 like the way Drupal provides leads to a File being saved. The solution could be 1.- Download (so keep the sink, but delete after checking the status CODE) 2.- Better, do a HEAD request. If the HEAD has No Size/is 4XX don't even attempt to download.

DiegoPino commented 4 months ago

Resolved via https://github.com/esmero/ami/commit/88899a7584db1457b730bb6d09f52ef2af2bd50e