NVlabs / FPSci

Aim Training Experiments
Other
67 stars 23 forks source link

Performance degredation when user status has a large number of users #355

Closed jspjutNV closed 2 years ago

jspjutNV commented 2 years ago

We've had a report that FPSci appears to slow down significantly when more than 50 users are listed in the user status (and config) file. Someone will need to investigate this problem.

As a work around, you can remove participants from the file who aren't needed.

bboudaoud-nv commented 2 years ago

This does make sense, especially when there are a large number of sessions (N) in an experiment and many users (M) to serialize out. A user status file with all users fully complete would have 2 x N x M session ID entries to serialize out, which may not be highly efficient.

We should think about ways to:

  1. Lower the overhead of writing a single update to a user status file (potentially move away from the Any file specification for this?)
  2. Reduce the frequency of writing status to disk when running an experiment in FPSci...

A threaded user status logging model might actually do more harm than good here as short sessions could result in just getting blocked on a mutex to write to the user status file...

bboudaoud-nv commented 2 years ago

One potential solution here would be to split the user status into 2 parts:

  1. Session ordering/configuration (a designed, read-only configuration file)
  2. Session completion status (an iteratively written log that includes user-session pairs)

This approach would segment the configuration (per-user and default session ordering) part of user status from the "pure status" of completed sessions. This would nearly completely eliminate the need for large write-back in higher-user/session count experiments. The session ordering for all users would be read once at the start of FPSci, and the new "status" file would be iteratively written with a single line of user ID : session ID each time a user completed a session. By iteratively parsing this status file at the start of each FPSci run we could effectively recover the combined user status we have relied on historically in FPSci.

This approach also appeals to our historical model in FPSci where read-only configuration is segmented from written-back "results" as much as is possible.