[TW-1711] pending.data corruption involving concurrent processes

taskwarrior commented 6 years ago

Daniel Shahaf on 2015-10-01T18:07:30Z says:

As reported on IRC some time ago:

Sep 19 00:14:37 <danielsh>  Ran into db corruption with 2.4.4.  Just after rebooting, spawned taskswamp instance.  git logs of ~/.task show four commits.  The first commit emptied pending.data.  The second and third appear normal.  The fourth prepended 154018 NUL bytes to line 4 of pending.data.
Sep 19 00:15:12 <danielsh>  I've recovered from this by backdating the ~/.task/ git worktree, but I kept a copy of the corruption.
Sep 19 00:18:10 <danielsh>  Those reports would have created a new instance of a recurring task.  Each of the four commits touches both the parent task and a new child thereof.  The various commits all add lines related to the recurring task to *.data, but each commit adds a different number of entries.
Sep 19 00:27:12 <danielsh>  Running debian stable (jessie) if it matters.

The problem occurred again now, in identical circumstances: while ~/.config/autostart/ was being run, immediately after a reboot. This time, there are 85316 NULs prepended to line 8 of pending.data.

taskwarrior commented 6 years ago

Migrated metadata:

Created: 2015-10-01T18:07:30Z
Modified: 2017-01-16T18:07:09Z

taskwarrior commented 6 years ago

Daniel Shahaf on 2015-10-01T18:14:15Z says:

The difference between the corrupted pending.data and the one obtained by git-backdating to before the reboot and running a single instance of task [to generate the recurring task's instance] are:

The .modified timestamp on the parent status:recurring task.
Only the corrupted file has the NULs line, followed by a status:pending instance of the recurring task.
Both files have a newly-added instance of the recurring task as the last task in the file.

(Consequently, if someone runs into this issue, deleting the line with NULs in it might suffice to recover from the corruption.)

taskwarrior commented 6 years ago

Paul Beckingham on 2015-11-18T15:50:13Z says:

What is ~/.config/autostart doing? I need to be able to reproduce this, otherwise I don't know what to do.

taskwarrior commented 6 years ago

Daniel Shahaf on 2015-11-18T21:06:55Z says:

Whoops, sorry, I forgot to state that. It spawns a tmux session which runs 9 task reports:

1. !/usr/bin/env zsh

sessionname=task
args=(
  set-option -g set-remain-on-exit on \;

  new-session -s $sessionname 
* n 'foo'
    'task report1' \;

  new-window -d
* n 'report2'
    'task report2' \;
1.  ⋮

  new-window -d
* n 'report9'
    'task report9' \;

)
tmux $args

I don't think zsh and tmux are essential to reproducing the problem; reducing the above script to for i in 1 2 3 4 5 6 7 8 9 ; do task ls& done will probably suffice.

I believe the problem is that multiple task instances are simultaneously trying to instantiate a new instance of a recurring task.

taskwarrior commented 6 years ago

Daniel Shahaf on 2015-11-19T21:39:00Z says:

I have the following hooks:

An on-add hook that modifies the description field under some conditions, which my recurring tasks don't meet.
An on-add hook that doesn't modify the task. (It just prints the UUID to /dev/tty.)
on-launch and on-exit hooks that run git diff --quiet HEAD; [ $? -eq 1 ] && git commit -amm.

I don't see immediately a reason why any of those should be significant. I do see an Hook Error: Expected feedback from a failing hook script. error by one of the 9 reports, which does not recur when checking out the corrupted state and running a report later. I am not sure whether that is part of the problem or a separate problem. It could be a separate problem if both git commit processes are spawned by two concurrently-run on-exit hooks, after the corruption already happened.

taskwarrior commented 6 years ago

Daniel Shahaf on 2015-11-21T18:53:53Z says:

I've now serialized my dotfiles using flock(1), which ought to workaround this.

taskwarrior commented 6 years ago

Adam O on 2016-01-26T18:52:55Z says:

I am experiencing a similar issue (TW-1761). I can confirm that removing the NUL characters is not sufficient to undo the corruption: there are a number of other mutilated lines in my pending.data file. I'm not certain whether all the corruption happened at the same time.

taskwarrior commented 6 years ago

Daniel Shahaf on 2016-02-09T04:12:38Z says:

[~paul] I assume this issue and TW-1761 should block TW-94?

I'm concerned that switching to atomic renames without ensuring mutual exclusion might mean corruptions would still be possible, but be subtler: e.g., two concurrent processes might be scheduled such that ~/.task is left with the first process' pending.data and the second process' completed.data (according to my reading of TDB2::commit()).

arooni commented 5 years ago

I have the same issue on mac; Unrecognized Taskwarrior file format or blank line in data. in /Users/david/.task/completed.data at line 328

i think this is happening when i'm running task sync in an automated fashion via crontab (and making changes locally):

0,5,10,15,20,25,30,35,40,45,50,55 /usr/local/bin/task sync >> /home/david/.task/synclog.txt 2>&1

pbeckingham commented 5 years ago

Absolutely this is caused by simultaneous writes. Don't sync via cron.

arooni commented 5 years ago

In that case, can I suggest a feature that would allow the user to pick a number of operations before taskwarrior would sync automatically? i.e. perform 10 operations then either sync automatically or prompt to sync. It could be a completely optional feature but I know I'd use it as it would ease accessing taskwarrior across multiple devices.

i know that right now the software does say 'there are changed locally' and in that way reminds you to sync but it would be nice to be able to type one less command

and if you're doing it in a prompted way; user is not likely to be editing tasks during the sync process which is how i shot myself in the foot with cron apparently

pawamoy commented 5 years ago

Just adding my 2 cents on the matter: I use flock to ensure only one task process runs at a time. It should be available by default on many distributions. For interactive shells, an alias: alias task='flock ~/.task task'. For cron or other places (conky for me): flock /home/me/.task task .... This way you make sure you won't corrupt your data because you ran task in your terminal at the same moment cron or conky did.

arooni commented 5 years ago

Brilliant! Sort of like a mutex on the task warrior process. Perhaps this should be included into the help documentation. Now I can have auto sync again without any drawbacks.

On Thu, Nov 22, 2018, 12:58 PM Timothée Mazzucotelli < notifications@github.com> wrote:

Just adding my 2 cents on the matter: I use flock to ensure only one task process runs at a time. It should be available by default on many distributions. For interactive shells, an alias: alias task=flock ~/.task task. For cron or other places (conky for me): flock /home/me/.task task .... This way you make sure you won't corrupt your data because you ran task in your terminal at the same moment cron did.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/GothenburgBitFactory/taskwarrior/issues/1735#issuecomment-441104407, or mute the thread https://github.com/notifications/unsubscribe-auth/AAh-e7YEhUXnY6PjrhpC0ROejWd2SGFEks5uxvPbgaJpZM4SFlM4 .

--

may today be the best day of your life,

david parkinson founder - aroonilabs.com blog - brofist.com twitter - @brofist https://twitter.com/brofist

pawamoy commented 5 years ago

Idea was from someone else in the issues :sweat_smile: but yeah, we should definitely add this in the documentation and man pages!

arooni commented 5 years ago

I have to say, I think this is happening even outside of sync issues. I tried the flock solution earlier and was still having issues so I disabled all automatic syncs via cron.

[I] ~/.task  task done end: 391 No handlers could be found for logger "taskw.taskrc" Completed task 792452d9 'finish installing apps to google pixel ; premium apps like moon+ reader etc; compare to s5 to see what needs to be installed'. Completed 1 task. Filter: ( ( id == 391 ) ) You have more urgent tasks. [I] ~/.task  h [task next ( +ACTIVE limit: or priority:H +PENDING and (+OVERDUE or +READY or +BLOCKED) )] Unrecognized Taskwarrior file format or blank line in data. in /Users/david/.task/pending.data at line 394 [I] ✘  ~/.task  [I] ~  task --version 2.5.1

line 394: status:"pending" tags:"weeklygoal,finance" uuid:"dd3eee7d-fb08-43da-b4c9-f35d8bec9626"]

I only have this happen on mac os x --- installed via homebrew. never happens on ubuntu linux for whatever reason

UPDATE: I'm pretty sure it was the problem of my command

task X done end:

the end: of no date I think was screwing things up. was trying to create an alias to make it faster to fill in non standard end dates. i'm also pretty sure that the sync was messing things up but hopefully with flock it won't.

SECOND UPDATE: this is still happening with running commands like task delete 1 2 ... again it seems to only happen on mac ; never on ubuntu

djmitche commented 4 months ago

No longer an issue in 3.0.

GothenburgBitFactory / taskwarrior

[TW-1711] pending.data corruption involving concurrent processes #1735

may today be the best day of your life,