babarot / gomi

🗑️ Replacement for UNIX rm command!
https://babarot.me/gomi
MIT License
316 stars 13 forks source link

Fixed a bug that added empty elements & changed to serial processing #36

Closed umlx5h closed 9 months ago

umlx5h commented 1 year ago

Thank you for creating the useful software.

I fixed the bug where garbage data was being written to the inventory file when I deleted a non-existent file or a file not allowed by permissions. https://github.com/b4b4r07/gomi/commit/cebc3726f899abb4b44d574db9e38bf81ed6737d

The cause is that it is initializing the files local variable in the Remove() method with the number of args.

Cause

files := make([]File, len(args)) # Allocate number array of args

defer c.Inventory.Save(files)    # writing in the number of args at the time of error
return eg.Wait()                 # immediately return if some files do not exist
How to reproduce ``` $ touch a1 # Error because a2 does not exist $ gomi a2 a1 a2: no such file or directory # garbage element is added at the end. $ jq < ~/.gomi/inventory.json | jq '.files[-2,-1]' { "name": "", "id": "", "group_id": "", "from": "", "to": "", "timestamp": "0001-01-01T00:00:00Z" } { "name": "a1", "id": "chfpm5kdpli3njdmn7kg", "group_id": "chfpm5kdpli3njdmn7k0", "from": "/home/wsl/projects/fork/gomi/dummy/a1", "to": "/home/wsl/.gomi/2023/05/13/chfpm5kdpli3njdmn7k0/a1.chfpm5kdpli3njdmn7kg", "timestamp": "2023-05-13T23:13:42.169092757+09:00" } ``` By fixing this, the following process would no longer be needed. (Although turning it off would affect existing users.) https://github.com/b4b4r07/gomi/blob/e63731adf335293729414d683e68a80f9f8fbd45/main.go#L396-L399

To fix this, I changed code to serial processing because i felt that parallelization made the code more complex and made it difficult to be consistent with error handling and inventory files. (Considering the characteristics of the program, i thought it would be better to focus on consistency and integrity.)

Error handling in Remove() has also been modified a bit, as it has been changed to a serial process.

I believe the following problems/conflicts are currently possible due to parallel processing.

In addition, benchmarking with the sample code at hand also confirmed that the parallelization is rather slow in some cases. (I confirmed this on linux, but it may be different in other environments.) (It seems that there is no particular benefit to parallelization since the operation is almost on-memory, just rewriting directory files on the page cache.)

In line with this modification, RestoreGroup() is also serialized to unify its internal behavior. https://github.com/b4b4r07/gomi/commit/754c4b2d6db08759f032b6ab449b3cbc69faf673 https://github.com/b4b4r07/gomi/pull/36/commits/03e42877bdbfaa194b756e56a32da611f028f68f

Since I fixed the bug of entering JSON with an empty structure, I thought it would be ok to remove the process of filtering out invalid logs when the id is empty in // Filter out invalid logs , but I haven't changed it because it might affect existing users.

Currently, the -f option ignores all errors, but this was likely to cause problems when permission errors or other errors occur, as they would not be noticed because they would not be displayed. I checked POSIX rm and confirmed that it only ignores non-existent files and does not set the exit code to 1, but in the case of other errors such as permission errors, the error is displayed and the exit code is set to 1.

https://man7.org/linux/man-pages/man1/rm.1.html

How to confirm ``` $ mkdir d1 $ touch d1/a $ sudo chown root:root d1 # without -f, it get an error $ gomi d1/a1; echo $? rename /home/wsl/projects/fork/gomi/dummy/d1/a1 /home/wsl/.gomi/2023/05/13/chfpk4sdpli3droidrbg/a1.chfpk4sdpli3droidrc0: permission denied 1 # with -f, do not get an error $ gomi -f d1/a1; echo $? 0 # in case of rm, it get an error $ rm -rf d1/a1; echo $? rm: cannot remove 'd1/a1': Permission denied 1 ```

It was not clear that only the -f option was used, so the help message was modified slightly. https://github.com/b4b4r07/gomi/commit/ba0fd1987b58b5f243c6a2ac007506ac740d52c7

If you have any minor error handling or concerns, I would appreciate it if you could merge them with a cherry pick.

benchmark with many small files

before (parallel processing)

# create 10000 dummy files
$ time for x in {1..10000}; do
  dd if=/dev/zero ibs=1 count=1 of=gomi-$x.txt 2>/dev/null
done
$ echo 3 | sudo tee /proc/sys/vm/drop_caches

# move to gomi
$ time gomi gomi-*.txt
gomi gomi-*.txt
0.25s user
1.36s system
292% cpu
0.552 total

$ echo 3 | sudo tee /proc/sys/vm/drop_caches

# very slow, consume cpu considerably, it seems json encoding cost is very high
$ time gomi -B
gomi -B
61.47s user
12.47s system
112% cpu
1:05.96 total

after (serial processing, write as batch)

# removing is also faster, less cpu
$ time ../gomi-fix gomi-*.txt
../gomi-fix gomi-*.txt
0.08s user
0.16s system
98% cpu
0.242 total

# restore as batch faster
$ time ../gomi-fix -B
../gomi-fix -B
0.24s user
0.13s system
93% cpu
0.391 total