CommandPost / FCPCafe

FCP Cafe Website
https://fcp.cafe
MIT License
26 stars 15 forks source link

Library damaged when working on shared Storage #349

Open inakisanz opened 2 months ago

inakisanz commented 2 months ago

Apple Feedback Assistant ID: MISSING

DESCRIBE THE BUG: A message appear when working saying the Library is damaged. We were experiencing FCP library corruption when they were stored on our SMB shares. It happens randomly in different Macs in our network connected to our NAS, Intel Macs or Apple Silicon with Ventura or Sonoma.

All these Libraries works ok locally but they could get damaged when working on shared storage in any moment. However, after the warning you could copy the Library to local drive and it works. After that you can copy from local Drive to NAS and it works again! However you could find the issue later with the same Library or another one.

Madia, cache and backups are external.


TO REPRODUCE: You only need to open a FCP Library and work, it will happen during the day.


EXPECTED BEHAVIOUR: When the warning message appears you accept it and keep working. Sometimes the Library is really damaged (probably you will lose the open project) and sometimes it's not. There's o clear pattern.


SPECS: Attached report.


ADDITIONAL COMMENTS:

Final Cut Pro_2024-04-02-112539_Inaki-IT.hang.zip

joema4 commented 2 months ago

See my previous comments here: https://github.com/CommandPost/FCPCafe/issues/250

LumaForge (who are experts at FCP/NAS integration) says: "On most NAS based storage, FCPX Libraries and media management commands will not work because NAS storage is typically shared through AFP or SMB Protocols, which are not designed to work with the FCPX Library Architecture." https://www.lumaforge.com/fcpx-shared-storage

Apple says: "If you’re using a shared storage system that uses the SMB protocol, the SMB volume must originate from a macOS server or a Linux server running Samba 4.3.4 or later configured with modules for Apple compatibility." https://support.apple.com/en-us/101919

See this archived post by Thomas Berglund on fcp.co about an investigation of NAS tuning parameters possibly needed to make FCP libraries work reliably on an SMB: https://web.archive.org/web/20200918143902/https://fcp.co/forum/4-final-cut-pro-x-fcpx/31510-media-cache-on-shared-storage-network?start=40#109131

latenitefilms commented 2 months ago

@joema4 - We should really consolidate all this incredibly useful information into a Shared Storage page on FCP Cafe! Feel free to make a Pull Request if this is something you're interested in tackling!

inakisanz commented 2 months ago

Hi,

Thanks for your feedback.

It happens our NAS is from Lumaforge, it has been desgined to work with FCPX since the very beggining. It has been working properly with FCP and SMB for 3 years. All these time, Libraries, caches, backups, fotage have been placed in the NAS with a very good performance.

A year ago when this issue start happening. We have been working (still doing it) with Lumaforge to figure it out. Also we've tried to reach Apple through Lumaforge but still waiting from feedback.

Meanwhile I was looking for more info here. I'll update as soon as I have news.

joema4 commented 2 months ago

There are apparently few (maybe no) network-specific calls in the FCP source code. The Flexo framework class FFStorageManager handles the setup and identification of storage locations. That class has a method URLIsSMBVolume, which takes a URL as input, retrieves the file system information using the statfs function, and checks if the file system type is "smbfs" to determine if it is an SMB volume. The method returns 1 if it is an SMB volume and 0 otherwise.

I ran FCP under XCode, put a library on a Mac network server and set a breakpoint on the +[FFStorageManager URLIsSMBVolume:] method, but it never hit that. I verified they were using SMB by using the terminal 'mount' command on the client and the 'sudo nettop -m' command on the server, so I don't know why it never hit the breakpoint.

In general FCP uses normal file I/O calls and relies on the network layer and the NAS to safely execute those. E.g, when FCP on a client machine does the 'fsync' call to flush any open buffers to disk. For redirected I/O to a NAS, the server must interpret that and use whatever is the equivalent call on its local filesystem. If any slight mixup occurs, the remote file buffers may not get flushed and could be susceptible to damage.

Most Unix-heritage filesystems, including HFS+ and APFS, don't have "mandatory" file locking. Unlike Windows you normally don't get a "file in use" error. However if another process is doing I/O to an FCP library, it is very easy to corrupt one. This is especially easy to happen in a NAS situation if multiple people are working on a set of libraries.

One way to troubleshoot a problem like this is by using synchronized concurrent tracing on both Mac client and NAS server. E.g, fs_usage on the client and strace on the server, then analyze those using trace visualization tools. E.g: Trace Compass: https://eclipse.dev/tracecompass/

Another step would be to open the damaged FCP library using a tool like SQLPro for SQLite and mount each CurrentVersion.fcpevent file (each of which are SQLite databases) and on each one, run the command 'PRAGMA integrity_check.' That checks for lower-level damage in the SQLite database, such as a damaged index. It does not check for logical-level inconsistencies with FCP data. That can be checked by selecting the project and while pressing the OPT key, doing Clip>Verify and Repair Project. Make sure the library is backed up before doing that.

If the SQLite database has damage which shows up using PRAGMA integrity_check, that likely has nothing to do with FCP.

iFXProductions commented 1 week ago

Hi,

Thanks for your feedback.

It happens our NAS is from Lumaforge, it has been desgined to work with FCPX since the very beggining. It has been working properly with FCP and SMB for 3 years. All these time, Libraries, caches, backups, fotage have been placed in the NAS with a very good performance.

A year ago when this issue start happening. We have been working (still doing it) with Lumaforge to figure it out. Also we've tried to reach Apple through Lumaforge but still waiting from feedback.

Meanwhile I was looking for more info here. I'll update as soon as I have news.

I don't know if this will encourage you, but here we have four editors working collaboratively on a NAS. All media, assets, and libraries are on the NAS, and we don't copy libraries locally. We've been doing this for seven years with different Macs without any hiccups. Recently, we bought four Mac Studios and ran into some issues, but now everything is running smoothly as before. A well-known guy named Bob Z. configured our NAS, and we couldn't be happier. Apple M1 Ultra 128GB Sonoma 14.3.1

For the workflow part: We duplicate the library on the NAS for each editor who needs to work on the TV episode. We work simultaneously on different parts of the episode and have access to the same media and assets. When everyone is finished, we consolidate everything into the same library and share the final video. We haven't encountered library corruption yet, and I was not aware that it could occur with this setup. We will keep that in mind if it happens someday.

inakisanz commented 1 week ago

Hi,

thanks for your comments and your workflow tips. In fact, our workflow is very similar to yours, but we have more editors (around 20-22) working simultaneously. We have some issues when we get the first Mac Studio (with M1) but I think the problem could be a combination of factors including big FCP Libraries, number of Macs connected to the network and Apple Silicon and Mac Intel working with the same Libraries.

Lately we have fewer editors (between 10-15) and our projects have less complex FCP Libraries, and we have less damaged Libraries. So it looks like it's related to connection issues, it could be our 10Gb switch or our NAS (Jellyfish).

We will continue investigating. Thanks!