Closed Chrissitopher closed 8 years ago
I'll look into it. I'm currently running a migration of a large site again. 4.7 to 6.2. After migration I will see if I can reproduce this. I am not sure if I have more than one file storage there.
I have now looked into this further: I don't think this is a bug in the extension, but the extension still has to deal with it.
I got these three entries per file by normally using TYPO3. In my situation, with only one single file storage, I consider the fact that one file has multiple sys_file
entries part of a TYPO3 bug. However, after you have migrated the references (which only makes them point to the same file again, see above), the extension allows you to actually delete the files from the /_migrated
folder - although they are still used (by the other entry in the sys_file
table).
In this case, using your extension basically screws up the whole installation.
Wow. Do you still have this situation? Can you provide a database dump or a couple of table dumps?
The database is kind of huge and I am not sure, if part of the dump alone helps you.
But I can describe the anomalies, which I have found:
Nearly all files from the _migrated
folder are in the sys_file
table twice: Once in storage 0 with identifier
/fileadmin/_migrated...
and once in storage
1 with identifier
/_migrated...
. The file is the same one and the sha1
both times is identical. The extension fails here by changing the sys_file_reference
from the one to the other sys_file
entry.
Apart from that, these files often have a third entry in the sys_file
table. This entry is the one to which the extension should actually change the sys_file_reference
entries. Currently the extension does not use this third entry, although it should.
(Apart from that, I found a number of other anomalies, which might be irrelevant here. Anyway, for completeness: There often are two entries for one file: One in fileadmin/user_upload
and the other in user_upload
. The sha1
both times is identical. Also there are entries, which point to files, although the files no longer are present at that location. But they are not marked as missing
in the sys_file
table either. Finally there were some bogus entries, e.g. references to files from inside the typo3/
folder.)
So I see two things, closely related to each other, which should be improved:
undouble:migratedfiles
changes a reference, it really changes it to a file, which is not inside the migrated folder.
To fix this, you will need to adjust your SQL queries: Currently they only check the identifier
for LIKE "/_migrated/%"
(file to be migrated) and NOT LIKE "/_migrated/%"
(alternate file). However, in cases where the identifier starts with fileadmin/_migrated/
, these queries will incorrectly recognize the file as an alternate file although it in fact is a migratable one.undouble:removemigratedfiles
should not delete a file, only because one of its sys_file
records is not referenced anywhere. Instead, check all of its sys_file
entries and only delete the file, if really none of these entries shows any usage.You storage 0 is strange. I have never seen this before. How did this come about? To you actually have two storage records in the backend?
I completely re-wrote the ext this weekend. It should detect duplicates better, but it still works with one storage only.
I'm conflicted about adding support for multiple storages. What should be moved where? I mean, duplicates found in the _migrated folder . . . should the references be moved to duplicates in storage 0 or 1 or 2 or 3 or whatever other storage duplicates are found in?
I have seen entries in storage 0 in several installations. These installs had exactly one storage in the backend and that one has ID 1. They never had another one. Additionally, having a storage with sys_file_storage.uid=0 is technically impossible as this uid is an integer with auto_increment, where the first value to be inserted is 1 and not 0. I think the fact that entries in storage 0 are there, is a bug.
I think you should not add support for multiple storages. You would need some logic to determine (or to make the user provide) the storages, to which each reference should be changed. This cannot be done automatically, without user interaction. And: Such a functionality goes far beyond just removing duplicates. For now, I would not add that.
Instead, I propose to add a check to your extension, which runs the following SQL query:
SELECT count(*) FROM `sys_file` WHERE `storage` = 0 AND `identifier` LIKE '/fileadmin/_migrated/%';
If this query returns exactly 0 rows, then you are good to go.
If however this query does return rows, then give out a clear warning or even make the extension abort execution so that it does not break things. (This case will happen for installations, which have actively been used with the broken TYPO3 versions. Read as: The bug affects many installations.)
The wording for the warning message could be:
Invalid entries found
The sys_file table in your database contains entries, which point to files inside the /fileadmin/_migrated folder using wrong information: They use storage ID 0 and an identifier starting with "/fileadmin/_migrated". Storage ID 0 however does not exist.
These entries can cause the extension to work incorrectly.
You are strongly advised to remove these wrong entries before using this extension.
In the installations I checked that with, these entries were not referenced anywhere, so for me, the solution was to run:
DELETE FROM `sys_file` WHERE `storage` = 0 AND `identifier` LIKE '/fileadmin/_migrated/%';
Afterwards, the extension worked correctly.
That's a good idea!
Thanks for integrating this fix, Michiel!
From reading, it produces the error message in the right cases. Technically, this solves the issue.
There is one small glitch that I noticed: In a few of the buggy installations, I did not only have /fileadmin/_migrated...
entries in storage 0, but also some other entries. Although having these other entries is buggy as well, these other entries did not cause problems when running the extension. For that reason you maybe want to rename exitWhenEntriesFoundInStorageZero()
to e.g. exitWhenProblematicEntriesFoundInStorageZero()
. Same for hasRecordsAttachedToStorageZero()
. :-)
I think I'll leave it like this. Because there should not be any entries attached to storage 0.
The command
undouble:migratedfiles
does not correctly identify, if a file is in the_migrated/
folder or not.I have the following situation hundreds of times: There are three rows in the
sys_file
table, which have the same sha1, for example:Note that these three entries belong to only two files! The installation has exactly one
sys_file_storage
, which has the uid 1 and which points to thefileadmin/
directory. A file storage with uid 0 does not exist. I have hundreds of files, which each have that kind of three entries. Also notice how the paths inside the storages are different: For storage 0, the folderfileadmin/
is part of the identifier while for uids 669 and 1386 it is not.With that kind of data,
migratedFilesCommand()
checks and updates theuid_local
field of thesys_file_reference
table from file 3 (which is in the_migrated/
folder) to file 669.So afterwards the
sys_file_reference
entry is still referencing a file (actually: the same file) in the _migrated/ folder.It would be great, if you could look into that! Have I used the extension incorrectly? Or are the entries in my sys_file table messed up?