bluesky / tiled

API to structured data
https://blueskyproject.io/tiled
BSD 3-Clause "New" or "Revised" License
61 stars 52 forks source link

Refuse to connect to SQLite database on NFS #460

Open danielballan opened 1 year ago

danielballan commented 1 year ago

SQLite support concurrent use by multiple processes on filesystems that implement file-locking correctly. Therefore it is safe to use on typical local file systems (e.g. ext4) but it can be corrupted on networked filesystems that have broken file-locking, notably including NFS. Docs: https://www.sqlite.org/faq.html#q5

When Tiled connects to a database (which is what happens when the engine is created)

https://github.com/bluesky/tiled/blob/af692573e9001f5cc8f01355356240fd10ea5138/tiled/catalog/adapter.py#L768

we should check whether engine.dialect == "sqlite" and, if so, try to determine what time of filesystem the path to the sqlite file resides on. Proceed if we can tell that the filesystem is something solid like ext4. Raise an error if we can tell that it is NFS (or any other known-broken filesystems). Perhaps issue a warning if we cannot tell either way whether the filesystem is solid or not.

The error message should provide a brief explanation, a link to somewhere in the SQLite docs for more information, and an environment variable that can be used as an override to suppress the error. It should be scary and annoyingly verbose, maybe TILED_SQLITE_IGNORE_UNSAFE_FILESYSTEM, to be sure they really mean it.

danielballan commented 1 year ago

It would be best to implement this in a stand-alone function, perhaps in tiled.utils, and then call it at the location in tiled.catalog.adapter linked in the comment above.

We can then use the same logic for the authentication database (a separate database, that may also be SQLite):

https://github.com/bluesky/tiled/blob/af692573e9001f5cc8f01355356240fd10ea5138/tiled/database/connection_pool.py#L18-L20

and perhaps in the cache, though the cache code is a moving target, so this is something to worry about in a separate PR. #16

danielballan commented 1 year ago

This, lifted from #16, may be a useful reference. https://stackoverflow.com/questions/460047/how-do-i-determine-if-a-directory-is-a-mounted-nfs-mount-point-in-shellscript/460061#460061

nmaytan commented 1 year ago

Would using psutils be a better choice for grabbing filesystem details? Granted, it's a little less direct (needs some path-checking logic which accounts for nested mounts), but avoids subproc and parsing stdout.

nmaytan commented 1 year ago
from psutil import disk_partitions as dpart

plist = dpart(all=True)

plist = [p for p in plist if p.fstype == 'nfs']

for p in plist:
    print(f"""'{p.mountpoint}' is type '{p.fstype}'""")

"""
Example output:
'/nsls2/conda' is type 'nfs'
'/nsls2/software' is type 'nfs'
'/nsls2/users/dallan' is type 'nfs'
'/nsls2/users/nmaytan' is type 'nfs'
"""
danielballan commented 1 year ago

Oh, nice, that's very clean.

We already have a psutil dependency on the server side, but we do not yet have one on the client side. I try to be really stingy about adding dependencies to the tiled client. I am mindful that it works best when it is a detail deep inside someone else's science code. If it drags in a bunch of dependencies, users will reasonably question whether it is worth the extra weight.

That said, this seems worth strong consideration.

nmaytan commented 1 year ago

Certainly I agree about being conservative with dependencies. Let's see how the options compare.

Also, just brainstorming a little: we could be a bit more specific and check NFS version too (even if not now, perhaps in the future). NFSv4 supports file locking natively, while NFSv3 does not. See the "Using file locks with NFS" section of the manpage. Also see: [ref1, ref2, ref3]

Version information is accessible in similar ways to fstype, and psutil does seem to report it in the opts string e..g

opts='[...],vers=3,[...],mountvers=3,[...]'

Also of interest, nfs lock tests

danielballan commented 1 year ago

Reflecting on this sometime later, I think adding psutil as a client-side dependency is worthwhile. It's light, pretty widely used, and may come in handy in other ways.

danielballan commented 9 months ago

Picking this back up:

nmaytan commented 8 months ago

Very relatedly, for at least the catalog adapter, we are using WAL. But sqlite docs advise specifically that WAL should not be used over a network file system. See point 2 of the disadvantages.

Should use of WAL also be contingent on FS check?

edit: also see "recommendations" section here, where this is even more explicit

danielballan commented 8 months ago

Good note.

Quoting the end of the linked recommendations:

Choose the technology that is right for you and your customers. If your data lives on a different machine from your application, then you should consider a client/server database. SQLite is designed for situations where the data and application coexist on the same machine. SQLite can still be made to work in many remote database situations, but a client/server solution will usually work better in that scenario.

I think we are on solid ground saying:

If someone comes forward a compelling use case for SQLite on network, then it may be worth supporting a code path for "if networked do not use WAL" but I would hesitate to proactively sign up to maintain that use case. Sensible?