Open lgsmith opened 1 year ago
There is no limit imposed by LOOS. VirtualTrajectory is rather naive and instantiates a Trajectory object for every input trajectory and stores them, so you will have open files hanging around for the life of the vtraj. This means you're bumping into the limit of open files for your system (I hit it at 253 trajectories on my mac). A more subtle issue here, as your debugging code points out, is that LOOS trajectories will almost always keep their contained files open, so you can bump into this limit in your own code even if you don't use VirtualTrajectory.
The simplest fix is going to be to raise the per-process limit on the machines where this is a problem.
I have inotify max watches set to a pretty high number for other reasons...Is there a different system setting or limit? Is there some issue with having a huge number of files open for reading if one has the ram to hold them?
I'm assuming that if I simply did this with python file objects I'd hit the same limit.
I've refactored my code to open the files I need in chunks, so I think that addresses my issue, but I wonder if it's possible for VirtualTrajectory to open the trajectory file it is going to get the next frame from lazily. This'd obviously add some computational overhead to iterating over vtrajs, but because it'd only ever have one traj open it would never run into this issue.
On Mon, Jun 12, 2023 at 9:59 AM Tod Romo @.***> wrote:
There is no limit imposed by LOOS. VirtualTrajectory is rather naive and instantiates a Trajectory object for every input trajectory and stores them, so you will have open files hanging around for the life of the vtraj. This means you're bumping into the limit of open files for your system (I hit it at 253 trajectories on my mac). A more subtle issue here, as your debugging code points out, is that LOOS trajectories will almost always keep their contained files open, so you can bump into this limit in your own code even if you don't use VirtualTrajectory.
The simplest fix is going to be to raise the per-process limit on the machines where this is a problem.
— Reply to this email directly, view it on GitHub https://github.com/GrossfieldLab/loos/issues/107#issuecomment-1587399039, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAZVYKCV33R6SSNAM2ZDKTXK4OETANCNFSM6AAAAAAZCON4MI . You are receiving this because you authored the thread.Message ID: @.***>
That's a watch-specific setting, I believe. You need to adjust the max number of open file descriptors. I think you need to use something like sysctl
to set it, but that may vary by distro and you may actually need to twiddle multiple settings (i.e. sysctl
and ulimit
).
Yes, you'll run into the same issue with native python objects, e.g.
files = []
while True:
try:
f = open('aligned.pdb')
except Exception as e:
print(f"Read {len(files)} files, exception = {e}")
break
files.append(f)
Running it, I get,
Read 253 files, exception = [Errno 24] Too many open files: 'aligned.pdb'
I think generators are an interesting idea, but the problem is that trajectories in LOOS are random-access. You'd only be supporting forward iteration this way. The downside is the Python code looks less like the C++ code, but the upside is that it's more pythonic. So, something to consider.
Interestingly enough, I hadn't realized that in the C++ code, there is no Trajectory.close()
. We rely on the destructor taking care of everything. We may also want to consider adding an explicit close()
function...
Really, the "LOOS way" would be to use subsetter
and pre-process your trajectories. Parsing PDBs is expensive. Is converting to a DCD or XTC not an option for you?
BTW, fair point about the error message. That should be a "to do" item for us...go through the code, identify top-level failure points (mostly file opens), and try to throw a meaningful message...
I've left this open on the basis that it serves as a reminder to see if the error message can be improved. I agree that there's no fundamental issue with LOOS's code. LMK if you'd like some adjustment to be made to the issue, or if you want it closed.
I went edited some of the more recent file format classes (MMCIF and MDTraj as both a system file and trajectory file) and added catches on failed opens. When there's time, I'll do more.
Commit 3286fc6f5f9d7782a3b07adc0d6d00f9bae4c2ac adds some more error checking to the main Trajectory file opening stage.
Describe the bug
When I splat a list of PDBs into
pyloos.VirtualTrajectory
's init method, I get aloos.loos.FileOpenError: Error while opening some_random_pdb.pdb
file open error on a particular structure, but I can read this file, usecreateSystem
on the file, and can inspect the file in pymol.I've found that if I shuffle the splatted list I get this error on a different file which is particular to that shuffle. I've also found that if I slice the list such that I am only reading the first 509 elements, regardless of how the list is shuffled, I am able to construct the vtraj successfully.
It seems like there's some sort of cap on how many trajectories can be assembled this way, and the error emerges as the list of trajectory objects is built, before that gets fed into VirtualTrajectory. It is nowhere close to the memory limits of the machines this is being tested upon.
To Reproduce
Steps to reproduce the behavior:
length_cutoff = -1 # Play with this to test how many frames you can use before error try_vtraj = False # if true, will try to build vtraj. Otherwise, will try to build a list of trajs.
conf_paths = Path('path/to/confs_dir').rglob('*.pdb') samples_conf_list = list(Map(str, conf_paths))[:length_cutoff] model = loos.createSystem(samples_conf_list[0]) random.shuffle(samples_conf_list) if try_vtraj:
if the input list is too long, this line throws the error.
else:
Interestingly, the crash is actually coming from list creation as an intermediate step. This line crashes with same error