MaxServ / t3ext-fal_migration_undoubler

Finds documents outside of the _migrated folder, updates references to point to those files and removes the redundant file from the _migrated folder.
GNU General Public License v2.0
2 stars 1 forks source link

Migration does not migrate files properly #1

Closed Chrissitopher closed 8 years ago

Chrissitopher commented 8 years ago

The command undouble:migratedfiles does not correctly identify, if a file is in the _migrated/ folder or not.

I have the following situation hundreds of times: There are three rows in the sys_file table, which have the same sha1, for example:

uid    storage    identifier
3      0          /fileadmin/_migrated/pics/PC186281_01.JPG
669    1          /_migrated/pics/PC186281_01.JPG
1386   1          /user_upload/Weihnachtsfeier/PC186281.JPG

Note that these three entries belong to only two files! The installation has exactly one sys_file_storage, which has the uid 1 and which points to the fileadmin/directory. A file storage with uid 0 does not exist. I have hundreds of files, which each have that kind of three entries. Also notice how the paths inside the storages are different: For storage 0, the folder fileadmin/ is part of the identifier while for uids 669 and 1386 it is not.

With that kind of data, migratedFilesCommand() checks and updates the uid_local field of the sys_file_reference table from file 3 (which is in the _migrated/ folder) to file 669.

So afterwards the sys_file_reference entry is still referencing a file (actually: the same file) in the _migrated/ folder.

It would be great, if you could look into that! Have I used the extension incorrectly? Or are the entries in my sys_file table messed up?

Tuurlijk commented 8 years ago

I'll look into it. I'm currently running a migration of a large site again. 4.7 to 6.2. After migration I will see if I can reproduce this. I am not sure if I have more than one file storage there.

Chrissitopher commented 8 years ago

I have now looked into this further: I don't think this is a bug in the extension, but the extension still has to deal with it.

I got these three entries per file by normally using TYPO3. In my situation, with only one single file storage, I consider the fact that one file has multiple sys_file entries part of a TYPO3 bug. However, after you have migrated the references (which only makes them point to the same file again, see above), the extension allows you to actually delete the files from the /_migrated folder - although they are still used (by the other entry in the sys_file table).

In this case, using your extension basically screws up the whole installation.

Tuurlijk commented 8 years ago

Wow. Do you still have this situation? Can you provide a database dump or a couple of table dumps?

Chrissitopher commented 8 years ago

The database is kind of huge and I am not sure, if part of the dump alone helps you.

But I can describe the anomalies, which I have found:

Nearly all files from the _migrated folder are in the sys_file table twice: Once in storage 0 with identifier /fileadmin/_migrated... and once in storage 1 with identifier /_migrated.... The file is the same one and the sha1 both times is identical. The extension fails here by changing the sys_file_reference from the one to the other sys_file entry. Apart from that, these files often have a third entry in the sys_file table. This entry is the one to which the extension should actually change the sys_file_reference entries. Currently the extension does not use this third entry, although it should.

(Apart from that, I found a number of other anomalies, which might be irrelevant here. Anyway, for completeness: There often are two entries for one file: One in fileadmin/user_upload and the other in user_upload. The sha1 both times is identical. Also there are entries, which point to files, although the files no longer are present at that location. But they are not marked as missing in the sys_file table either. Finally there were some bogus entries, e.g. references to files from inside the typo3/ folder.)


So I see two things, closely related to each other, which should be improved:

Tuurlijk commented 8 years ago

You storage 0 is strange. I have never seen this before. How did this come about? To you actually have two storage records in the backend?

I completely re-wrote the ext this weekend. It should detect duplicates better, but it still works with one storage only.

I'm conflicted about adding support for multiple storages. What should be moved where? I mean, duplicates found in the _migrated folder . . . should the references be moved to duplicates in storage 0 or 1 or 2 or 3 or whatever other storage duplicates are found in?

Chrissitopher commented 8 years ago

I have seen entries in storage 0 in several installations. These installs had exactly one storage in the backend and that one has ID 1. They never had another one. Additionally, having a storage with sys_file_storage.uid=0 is technically impossible as this uid is an integer with auto_increment, where the first value to be inserted is 1 and not 0. I think the fact that entries in storage 0 are there, is a bug.

I think you should not add support for multiple storages. You would need some logic to determine (or to make the user provide) the storages, to which each reference should be changed. This cannot be done automatically, without user interaction. And: Such a functionality goes far beyond just removing duplicates. For now, I would not add that.

Instead, I propose to add a check to your extension, which runs the following SQL query:

SELECT count(*) FROM `sys_file` WHERE `storage` = 0 AND `identifier` LIKE '/fileadmin/_migrated/%';

If this query returns exactly 0 rows, then you are good to go.

If however this query does return rows, then give out a clear warning or even make the extension abort execution so that it does not break things. (This case will happen for installations, which have actively been used with the broken TYPO3 versions. Read as: The bug affects many installations.)

The wording for the warning message could be:

Invalid entries found

The sys_file table in your database contains entries, which point to files inside the /fileadmin/_migrated folder using wrong information: They use storage ID 0 and an identifier starting with "/fileadmin/_migrated". Storage ID 0 however does not exist.

These entries can cause the extension to work incorrectly.

You are strongly advised to remove these wrong entries before using this extension.

In the installations I checked that with, these entries were not referenced anywhere, so for me, the solution was to run:

DELETE FROM `sys_file` WHERE `storage` = 0 AND `identifier` LIKE '/fileadmin/_migrated/%';

Afterwards, the extension worked correctly.

Tuurlijk commented 8 years ago

That's a good idea!

Tuurlijk commented 8 years ago

Fixed in https://github.com/MaxServ/t3ext-fal_migration_undoubler/commit/cdac1e8a23855c420386475f8ffec13e8fe82377

Chrissitopher commented 8 years ago

Thanks for integrating this fix, Michiel!

From reading, it produces the error message in the right cases. Technically, this solves the issue.

There is one small glitch that I noticed: In a few of the buggy installations, I did not only have /fileadmin/_migrated... entries in storage 0, but also some other entries. Although having these other entries is buggy as well, these other entries did not cause problems when running the extension. For that reason you maybe want to rename exitWhenEntriesFoundInStorageZero() to e.g. exitWhenProblematicEntriesFoundInStorageZero(). Same for hasRecordsAttachedToStorageZero(). :-)

Tuurlijk commented 8 years ago

I think I'll leave it like this. Because there should not be any entries attached to storage 0.