christianb93 / ctOS

ctOS is a bit 32 bit Unix-like operating system that I developed as a toy project
http://www.leftasexercise.com
MIT License
35 stars 7 forks source link

rm -rf fails after porting coreutils #2

Closed christianb93 closed 6 years ago

christianb93 commented 6 years ago

When porting coreutils, rm compiles and works well in some simple cases, but fails if a recursive deletion is attempted, for instance

rm -rf /tests

That issue is a combination of several problems.

When rm is called with the option -r, it uses the generic mechanism provided by the functions in lib/fts.c which is part of the GNU lib. The first function which is called is fts_build. This function is supposed to scan the directory /tests and build a list of the files that it contains which rm will then walk and delete before attempting to delete the directory itself. This does already fail, and consequently the files are not deleted. When rm then tries to remove the directory itself, that (correctly) fails.

But why does fts_build not work? fts_build uses fts_opendir which is a macro and translates into a call to opendirat. This first gets a file descriptor pointing to /tests and then calls fdopendir to open that (in order to work around the fact that ctOS does not have opendirat).

Now fdopendir assumes that the file register has previously been registered with the bookkeeping device in fchdir.c, i.e. that _gl_register_fd has been called for this file descriptor. However, this did not happen, and therefore fdopendir fails.

In fact, the gnulib open replacement rpl_open defined in lib/open.c does not call _gl_register_fd because REPLACE_OPEN_DIRECTORY is defined. This variable in turn is defined because we are cross compiling and the configure script is guessing in this case, the guess is "no". So we get the combination

REPLACE_OPEN_DIRECTOY=1 REPLACE_FCHDIR=1

which I think does not work.

As a quick workaround, I did edit lib/config.h and manually set REPLACE_OPEN_DIRECTORY to 0. We also have to make sure that REPLACE_FSTAT is not defined by patching the configure script itself.

Now there is some progress - we do now actually walk the various entries in the directory, but the actual unlink still fails. Let us see why.

For each directory, we call excise once (in remove.c) which then calls unlinkat. Unlink at uses the general mechanism for the at-family of functions in lib/at-func.c. It first tries to read from the proc file system. As this is not there, it saves the current working directory. It then calls fchdir on the provided file descriptor.

However, it seems that at some point before, the file descriptor 10 has already been closed. Therefore _gl_directory_name returns 0. So at some point the mechanism looses track of this file descriptor.

I believe that the only way to avoid this mess is to implement the family of functions that the GNU lib tries to replace, i.e.

openat fdopendir fchdir

We might then still have a problem with the fact that the configure script does not correctly set REPLACE_OPEN_DIRECTORY, but this might be fixed by a simple patch of configure.

Update: in fact, tests are now passing after adding the three functions above and rebuilding with the updated libc.a. Closing this issue