anatol / pacoloco

Caching proxy server for Arch Linux pacman
MIT License
216 stars 30 forks source link

Fix map concurrency #70

Closed cofinalsubnets closed 1 year ago

cofinalsubnets commented 1 year ago

Hi, thanks for this great project. The concurrent writes issue from #44 seems to be crashing my pacoloco installation every couple of days. I think this change will fix the problem without adding too much extra complexity: the top-level time.Time-valued maps are removed and the times are stored on each Repo instead; and the write to config.Repos is removed by storing *Repo values and giving each one its own mutex.

Please let me know what else i can do to help. Thank you! :slightly_smiling_face:

Focshole commented 1 year ago

Thank you for your work! The idea of moving mutexes to repos seems fine, it is a progress in the proper direction. The issues I see so far are:

Ideally, this would definitely reduce many race conditions but it won't defeat them, as all maps which can be accessed concurrently may show those issues again. I do not see an easy way out of this, until the major pacoloco refactoring gets fixed

BTW, I am saying those things hadn't tested your work so far, sorry

cofinalsubnets commented 1 year ago

Sure, i can add that test. Can you elaborate on how these errors could still happen? It seems like there are only two maps left (config.Repos and downloadingFiles), and of those, only downloadingFiles could be read/written concurrently, since config.Repos isn't written to after it's parsed from YAML. And downloadingFiles is already behind a mutex. So i don't think concurrent map writes are possible anymore, but i could be missing something.

anatol commented 1 year ago

Thank you for your worj @gw000. I like the idea of using finer-grained concurrency.

cofinalsubnets commented 1 year ago

@Focshole I updated the tests to only ignore the mutex, and check the timestamps. I also added a config test to make sure the new timestamp fields (which are exported to work with cmp.Equal) won't be unmarshalled from YAML, if the default keys should happen to be present for some reason.

IMO using RWMutexes would be premature at this point. There would only be one or two very quick reads that could be protected with read locks, and it would add complexity to a critical code path for not much performance gain.

@anatol Thank you for creating this project!

Focshole commented 1 year ago

Sure, i can add that test.

I mean, it is enough to add the same checks you see for the whole configs also to the individual structures that should have been checked before. Something like "if want.repos!=nil..." and then you check the "got" "want" with the structures. Sorry if I have explained myself poorly, I wrote some comments about it

Can you elaborate on how these errors could still happen? It seems like there are only two maps left (config.Repos and downloadingFiles), and of those, only downloadingFiles could be read/written concurrently, since config.Repos isn't written to after it's parsed from YAML.

Well, not exactly. The config repos can be updated if it is specified through mirrorlist (it has to read periodically the mirrorlist file if it has been modified, then periodically update the relevant repo urls). The race condition scenario I see is the following:

  1. 2+ concurrent packages gets downloaded from the same repo (very common scenario), and the repo is updated through mirrorlist
  2. The first package triggers an update to the mirrorlist urls
  3. If the mirrorlist had been updated, the mutex locks for write to edit the repo urls
  4. In the meanwhile, 1+ packages are being downloaded from the same repo
  5. Concurrent access to the repo urls, as they are being both updated and read at the same time

With a RWMutex, this scenario can be avoided, as read locks allow concurrent reads but not writing when reads are happening :+1:

And downloadingFiles is already behind a mutex. So i don't think concurrent map writes are possible anymore, but i could be missing something.

True

cofinalsubnets commented 1 year ago

Maybe i'm confused but i don't think the race condition you're describing could arise. If the mirrorlist has been updated, the thread that discovers that will lock the repo, preventing any other thread from reading the URLs/starting to download until the mirrorlist has been parsed and the sources have been updated. If, hypothetically, the other thread won the race, then it would be the one to find the mirrorlist out of date, and the other thread would wait. Again, i think an RWMutex is very much overkill here.

Focshole commented 1 year ago

Maybe i'm confused but i don't think the race condition you're describing could arise. If the mirrorlist has been updated, the thread that discovers that will lock the repo, preventing any other thread from reading the URLs/starting to download until the mirrorlist has been parsed and the sources have been updated. If, hypothetically, the other thread won the race, then it would be the one to find the mirrorlist out of date, and the other thread would wait. Again, i think an RWMutex is very much overkill here.

The logic behind avoiding race conditions involves defining proper access control to the relevant structures that are being accessed concurrently both RW. The structures that I see that are being accessed concurrently are:

Repo.LastMirrorlistCheck
Repo.LastModificationTime
Repo.URLs

Thanks to your work, they are much smaller structures than before. I see that both

Repo.LastMirrorlistCheck
Repo.LastModificationTime

Have been protected with Repo.mutex. (BTW, shouldn't it be Repo.Mutex in order to be accessed publicly? Probably I'm wrong here). However Repo.URLs access seems unprotected.

In this scenario, This instruction:

https://github.com/gw000/pacoloco/blob/fix-map-concurrency/mirrorlist.go#L60

And this instruction:

https://github.com/gw000/pacoloco/blob/fix-map-concurrency/pacoloco.go#L311

Can be executed at the same time.

What about yet another mutex :smile: ? Something like Repo.urlsMutex. Going with a normal mutex would be nonsensical, as most access are just reads that can be done concurrently. Golang won't make writes starve, it is in its design (can't find the source, sorry).

repo.urlsMutex.RLock() before the URLs are being used repo.urlsMutex.RUnlock() after the URLs are being used repo.urlsMutex.Lock() only when they are actively being updated

What do you think?

EDIT: I just realized I am wrong

cofinalsubnets commented 1 year ago

Thanks for explaining, i see the error now. I'll add another mutex.

Focshole commented 1 year ago

Thanks for explaining, i see the error now. I'll add another mutex.

well, wait, the original scenario is wrong, but the conclusion that it may occasionally happen is still right.

With the blocking mutex on checkAndUpdateMirrorlistRepo, the only occasion it could happen is if pacoloco has a quite high volume of requests, in which:

  1. the first request does not update the mirrorlist
  2. the second request does update the mirrorlist and thus the URLs
  3. the first request breaks during that for cycle because of the second request
cofinalsubnets commented 1 year ago

I added read locks around the two repo.URLs loops in handleRequest and prefetchRequest, and a write lock around the mirrorlist update.

cofinalsubnets commented 1 year ago

Unless i'm very confused, the tests already do what you want. IgnoreFields is selective about the fields it ignores, that's why it takes their names as arguments. https://pkg.go.dev/github.com/google/go-cmp/cmp/cmpopts#IgnoreFields

cofinalsubnets commented 1 year ago

That seems to be a bug in master. Or is it even a bug? There's no comparison at all in that test, seems like it just makes sure parsing doesn't panic.

Focshole commented 1 year ago

My fault, sorry for bothering in those tests :facepalm: I tested them and they do what they should. Sorry again

If mutexes should not be capitalized (i am not sure about what we should do about it), this is ok to be merged IMO!

cofinalsubnets commented 1 year ago

I don't mind. I left them lowercase just to make sure YAML would ignore them, and because unlike the time fields, they don't need to be accessible to cmp.Equal. They could also be uppercase with a yaml:"-" annotation. Let me know which you prefer :slightly_smiling_face:

Focshole commented 1 year ago

It is indifferent to me, if it works and it is maintainable it is fine to me! Well done! :+1: