Two chunk instances downloading the same file

danielfireman commented 1 year ago

Do we care about the same file being downloaded simultaneously by two chunk instances? Asking because I believe this can a pre-release feature.

Example:

$ ./chunk HTTP://a.b/c &
$ ./chunk HTTP://a.b/c &

The result of this sequence of operations is unknown.

We could deal with it after https://github.com/cuducos/chunk/issues/8, by augmenting the infrastructure in place.

cc/ @cuducos

cuducos commented 1 year ago

With the changes from #38 we started to have a more predictable scenario here.

This would still be a scenario we wouldn't be prepared for:

$ ./chunk HTTP://a.b/c &
$ ./chunk HTTP://a.b/c &

But this would be OK:

$ ./chunk HTTP://a.b/c &
$ cd another-directory && ./chunk HTTP://a.b/c &

What would happen here is that the file c would be downloaded twice since the local path would be different.

In my opinion, two processing downloading to the same local path should not be allowed (classic concurrency problem both on the downloaded file and on the progress file). I am just not sure about how to control and prevent that.

cuducos commented 1 year ago

If #38 is ok, what about we create a ~/.chunk/lock file to only allow one instance of chunk running at a time?

danielfireman commented 1 year ago

If #38 is ok, what about we create a ~/.chunk/lock file to only allow one instance of chunk running at a time?

It is possible, but I believe we don't need to make it too restrictive. Bellow, I try to do some brainstorming about a less restrictive option: consider each progress management file a lock file.

That has the same complexity as creating a single lock file but allows us to have as many chunk instances as needed, as long as they do not download the same file.

That said, there is one faulty case missing that we should treat either way: what happens when the chunk instance crashes?

There is a straightforward way to solve this problem:

keep and update the PID of the process in the progress management file
keep and update the timestamp of the last activity (chunk download)

With that, every time a download is started, chunk should do as follows:

verify if the progress manager file is there
1. if it is, check the owner PID (if it is the owner, proceed) and the last activity (if it is too long ago, become the owner and proceed)
2. otherwise, create a progress manager file as usual and proceed

Of course, we could introduce the --force-start flag in the future, which guarantees the current chunk instance rules them all.

cuducos / chunk

Two chunk instances downloading the same file #36