Open inghamn opened 4 years ago
We can probably use file_managed.filesize to identify duplicates.
select i.entity_id, i.field_image_width, i.field_image_height,
f.uuid, f.filename, f.uri, f.filemime, f.filesize,
f.created, f.changed
from media__field_image i
join file_managed f on f.fid=i.field_image_target_id
where bundle='cover_image'
and f.filesize=595572;
We should probably do some deduplication during the export. If the files have the same name and the same filesize, they are most likely the same file. We could grow a lookup hash and skip files that have already been exported.
I reviewed the ~1,000 or so files that would be considered duplicates based on filesize. It looks like we're safe to consider filesize to determine unique files. I don't think we need to do any hashing of the files themselves.
select x.filesize, f.filename, f.uri
from (select filesize, count(*) as c
from file_managed
left join media__field_image on fid=field_image_target_id
where field_image_target_id is null
group by filesize having c>1) x
join file_managed f on x.filesize=f.filesize
order by x.filesize, f.filename;
We are installing the document management system, OnBase. Instead of staff just uploading files to Drupal, we want to host the files in OnBase, and only link to the files from Drupal.
We need to have a way to migrate all the current media out of Drupal, and ingest it into OnBase.