Netatalk / netatalk

Netatalk is a Free and Open Source AFP fileserver. A *NIX or BSD system running Netatalk is capable of serving many Macintosh clients simultaneously as an AppleShare file server.
https://netatalk.io
GNU General Public License v2.0
332 stars 85 forks source link

Deprecate legacy CNID backends #508

Closed rdmark closed 11 months ago

rdmark commented 11 months ago

Presently, netatalk3 has code for the following CNID backends:

The tdb backend is called out as deprecated in the docs. The last backend isn't recommended for general use. Sharing a read-only file system like a CD-ROM seems to the narrow usecase. The mysql backend is poorly documented, and I haven't really tested it yet. Is it fully functional and reliable? The cdb backend seems to be the historical default, before v2.1.

Are there specific usecases for these four backends that warrant keeping either of them (considering the maintenance overhead, attack vectors, etc.)?

slowfranklin commented 11 months ago

If you want to put an axe on this, here are my thougts:

Hth! -slow

On 10/2/23 09:53, Daniel Markstedt wrote:

Presently, netatalk3 has code for the following CNID backends:

  • cdb
  • dbd
  • last
  • mysql
  • tdb

The tdb backend is called out as deprecated in the docs. The last backend isn't recommended for general use. Sharing a read-only file system like a CD-ROM seems to the narrow usecase. The mysql backend is poorly documented, and I haven't really tested it yet. Is it fully functional and reliable? The cdb backend seems to be the historical default, before v2.1.

Are there specific usecases for these four backends that warrant keeping either of them (considering the maintenance overhead, attack vectors, etc.)?

— Reply to this email directly, view it on GitHub https://github.com/Netatalk/netatalk/issues/508, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGFDHAWEQJSZX375A5UGOLX5JXINANCNFSM6AAAAAA5PAZNLU. You are receiving this because you are subscribed to this thread.Message ID: @.***>

ghost commented 11 months ago

Now that would simplify things considerably. If dbd works with sharing RO filesystems I would go for one backend only. (@slowfranklin , @rdmark do you know if this is the case?). If not I'd be happy to axe cdb and tdb and continue with just 2 (dbd and last) backends based on Berkeley DB, especially as it is still being maintained by Oracle. Do you know why Debian and other linux distros seem to stick to version 5 in their package managers?** Surely we can simplify the BDB macro to remove support for pre-5 versions? Happy to work on this once consensus is reached...

**EDIT: It's due to a licensing change after Version 5

rdmark commented 11 months ago

Now that would simplify things considerably. If dbd works with sharing RO filesystems I would go for one backend only.

I ran a simple test: In Linux mount an iso image as a block device.

dmark@macuntu:~$ sudo mount -o loop ./PrinceCDColl.iso /mnt/cdrom/
mount: /mnt/cdrom: WARNING: source write-protected, mounted read-only.

Configure a shared volume with /mnt/cdrom ... using the dbd backend connect with AFP and interact with the shared volume. Seems to work as expected without any explicit configuration in afp.conf -- file system can be read but not written to.

So... at a glance we don't really need the last backend...?

cdevers-es commented 11 months ago

Hello again.

As noted in the discussion for #493, my employer until recently supported using Netatalk/AFP for access to our shared storage product.

In our case, we offer a distributed solution where clients could connect to any of several servers to access the storage pool. We found that this wasn't always reliable for AFP clients, because users “Alice” & “Bob” might be working on the same volume, but getting to it via separate storage nodes, which led to AFP/CNID problems, because neither of them was able to see locks being created by the other user.

Switching to MySQL mitigated this problem, because then there was a single source of truth for all of the nodes in the group to synchronize with, and therefore the AFP clients had far fewer problems with invalid CNID information.

Eventually, as noted in #493, we solved our AFP problems by dropping AFP support and removing Netatalk (v3), so at this point this is of historical interest, at least for us. But if you have anyone else providing AFP access to distributed storage, they too might be using a dedicated MySQL host to manage this.

(That said, if the MySQL support was an experimental proof-of-concept, then that’s an argument against keeping it. We were certainly using it in production, and it seemed to be fine at the time for our needs, but ¯\(ツ)/¯.)

rdmark commented 11 months ago

@cdevers-es Thanks again for sharing your insights and (historical) use cases! It's very valuable to learn that the MySQL backend has been used in a production setting. I assume that you didn't run into any critical bugs that you can think of, since you haven't mentioned catastrophic data loss yet? ;)

MySQL being a much more mainstream piece of technology is another reason for potentially keeping that backend. In fact, I found out the hard way that Alpine Linux deprecated their Berkeley DB (v5) package with v3.13 in 2021, leaving you to having to build BDB from scratch on that OS. The more forward-looking OSes may follow suite in the future. Going one step further and making the MySQL backend the default one will arguably future-proof Netatalk.

In this alternative scenario I propose we make MySQL the default, and keep dbd as the legacy fallback, disabled by default in order to remove the hard dependency on Berkeley DB. Deprecate the other 3 backends.

Thoughts?

rdmark commented 11 months ago

As you said yourself elsewhere, Oracle encumbered Berkeley DB with an unreasonable licensing scheme from v6 onward.

Just a side note, FreeBSD does distribute v18, and 3rd party Solaris repos do as well.

In fact, one may argue that Oracle is silently abandoning Berkeley DB. The present versioning scheme is based on the two last numbers of the release year. So the latest major version, v18, is now 5 years old (although it saw a minor revision in 2020.)

The open source fork of v5 isn't actively developed either AFAICT so continuing to depend on BDB is arguably risky...

As discussed in the ticket, once I learned that the MySQL PoC backend has actually been used successfully in a production environment I've started to think that we should keep it or even make it the default in a future major version...

------- Original Message ------- On Tuesday, October 3rd, 2023 at 1:03 AM, dgsga @.***> wrote:

Now that would simplify things considerably. I'd be happy with dad and last backends based on Berkeley DB, especially as it is still being maintained by Oracle. Do you know why Debian and other linux distros seem to stick to version 5 in their package managers? Happy to work on this once consensus is reached...

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

slowfranklin commented 11 months ago

On 10/3/23 12:07, Daniel Markstedt wrote:

As you said yourself elsewhere, Oracle encumbered Berkeley DB with an unreasonable licensing scheme from v6 onward.

Just a side note, FreeBSD does distribute v18, and 3rd party Solaris repos do as well.

In fact, one may argue that Oracle is silently abandoning Berkeley DB. The present versioning scheme is based on the two last numbers of the release year. So the latest major version, v18, is now 5 years old (although it saw a minor revision in 2020.)

The open source fork of v5 isn't actively developed either AFAICT so continuing to depend on BDB is arguably risky...

As discussed in the ticket, once I learned that the MySQL PoC backend has actually been used successfully in a production environment I've started to think that we should keep it or even make it the default in a future major version...

to be honest, I don't reeeeaaaaally care, but.... at least dbd doesn't require specific configuration as part of the user and forcing users to setup mysql seems bleah! :)

I haven't followed along of what mess Oracle made of bdb, if distros are indeed removing it, then I guess in the long term we'll need something like a tdb backend with transaction support.

-slow

cdevers-es commented 11 months ago

@cdevers-es Thanks again for sharing your insights and (historical) use cases! It's very valuable to learn that the MySQL backend has been used in a production setting. I assume that you didn't run into any critical bugs that you can think of, since you haven't mentioned catastrophic data loss yet? ;)

There’s always bugs with something somewhere, but we muddle through. Such is life. :-)

Over the years, we’ve supported a number of protocols for accessing our storage: AFP, SMB, NFS, FTP, etc. They all have pros & cons.

For a while there, AFP was a promising option for Mac users, mainly because it seemed to support better throughput than SMB, its primary mainstream alternative. But AFP always seemed to be a little …glitchy.

But then Apple started removing support for AFP, and their SMB support got much better. And after a couple of years, we found that the majority of our Mac customers weren’t using AFP anymore, so by the time we decided to remove support for it, we had very little pushback about the change.

MySQL being a much more mainstream piece of technology is another reason for potentially keeping that backend. […]

In this alternative scenario I propose we make MySQL the default, and keep dbd as the legacy fallback, disabled by default in order to remove the hard dependency on Berkeley DB. Deprecate the other 3 backends.

Thoughts?

Another thing to consider is that MySQL is considerably more complex than Berkeley DB.

With the latter, it's just a monolithic file, and the software handles all interaction with it. For someone running a simple turnkey home or office file server, this is a pretty painless procedure to install, set up, and operate. You can go years without even realizing that Netatalk even uses such database files. (I certainly did just that.)

Moving to MySQL means setting up a proper Relational Database Management Server (tm), possibly on a separate host, which brings in the complexity of networking, user access, security, and general database administration. It's not necessarily rocket science to do all this, but it's a much steeper learning curve than something like a BDB or SQLite file: it won't be possible to stand up a new Netatalk instance from scratch without forcing new admins to contend with at least some of this complexity.

Alternatively, since SQLite is widely used, and public domain, it might provide a compelling alternative to BDB that could not be harmed in the future if it were to be acquired by a disinterested parent company; as public domain software, nobody can “own” it like that. But SQLite is about as architecturally simple as Berkeley DB — they’re both just a “flat file”. And since SQLite supports a SQL dialect, it might make Netatalk’s CNID database management code easier to maintain, being naturally closer (or even identical) to the SQL used for MySQL/MariaDB, rather than the non-SQL syntax needed for interaction with BDB.

Michael-Wohlstadter commented 11 months ago

The complexity of the MySQL setup and configuration is an important point. What is the primary demographic of our user base? Is it mostly businesses that have the in house expertise or access to such expertise? Or is it hobbyists and home users?

As to a file based backend, I second the consideration of SQLite. My primary database server is PostgreSQL for the spatial data work that I do. But I have found SQLite to be a remarkably functional substitute for use case environments that don't support a database server.

rdmark commented 11 months ago

I withdraw my suggestion to make MySQL the default backend. It's critical to keep the default configuration seamless and self-contained. In my mind the primary userbase for netatalk is the latter category: hobbyists and home users. We know from community feedback that some people still run enterprise deployments of netatalk but I expect that share to decrease as older Macs get decommissioned.

FWIW the now-defunct netatalk-classic fork had a partially working sqlite backend last year. The fork has been purged from the internet unfortunately so we can't study the code.

rdmark commented 11 months ago

FWIW the integration tests use the last backend for testing. Cf. /test/afpd/test.sh

ghost commented 11 months ago

How about we make a start on the backends by removing the already deprecated CDB code?

ghost commented 11 months ago

Where I at with the backends at the moment is to remove CDB, TDB and last, and have DBD as the default with MySQL as alternative. I searched the Internet Archive for a copy of the last netatalk-classic release but was only able to get a copy of the 2020 pre-sqlite code. I'll contact Mr Kobayashi to see if he's happy to share his old code (if he still has it!)

Michael-Wohlstadter commented 11 months ago

@dgsga Is the code here of use? https://codeberg.org/cryu/netatalk-classic/src/branch/netatalk-classic/libatalk/cnid/sqlite

rdmark commented 11 months ago

@dgsga I think we should keep last for now since it's low maintenance, and used by the integration test suite.

Otherwise I agree with your plan!

ghost commented 11 months ago

OK, leave it with me. We'll have dbd (default), last and MySQL

ghost commented 11 months ago

@dgsga Is the code here of use? https://codeberg.org/cryu/netatalk-classic/src/branch/netatalk-classic/libatalk/cnid/sqlite

Great find, we will import the sqlite code to a branch here so we can work on it. Any code contributions are very welcome!!