cxl-micron-reskit / famfs

This is the user space repo for famfs, the fabric-attached memory file system
Apache License 2.0
31 stars 9 forks source link

how to ensure the data written to the shared memory by software? #59

Closed guoanwu closed 3 months ago

guoanwu commented 3 months ago

software managed cache coherency between hosts call flush operation to flushed data to the shared memory, but I think "cache flush + fense" can't ensure data written to the shared memory and the other host might not read the correct data. We need a mechanism to ensure the data reached to the device (the data might be on the io.buf or under transition in the cxl link). This is a open question for discussion.

jagalactic commented 3 months ago

Sorry to reply slowly, and this won't be a "full" reply. I'm thinking we need a documentation page that covers this topic thoroughly in one place. But a few observations:

Famfs manages cache coherency for its superblock and log. Looking at the log append and logplay code should shed some light. Superblock and log headers are checksummed, and log entries are sequence numbered and checksummed. The strategy is basically flush the cache (CLFLUSH) after writing anything to the superblock or log. And when reading, CLFLUSH and retry if a checksum mismatch is detected.

The famfs pcq test program also attempts to manage cache coherency for an in-memory producer/consumer queue - with basically the same techniques.

It would be super-nice if CPU vendors gave us finer control than the CLFLUSH variants. Specifically, in these use cases we know whether we need to trigger a write-back or an invalidate, but we have to rely on the CPU doing the right thing in response do CLFLUSH (which says "do whichever is appropriate").

In CXL3.1, there is explicit support for sharing, which should make things safer on the CPU side. In addition, CXL3.1 supports read-only mappings of DCD allocations, which will prevent inappropriate write-back if the CPU ever attempted it.