phyluce_probe_run_multiple_lastzs_sqlite error/bug

crcardenas commented 1 year ago

I am "slicing" UCE's from different data types, anchored hybrid enrichment (ahe), transcriptomes, and genomes, following the "Harvesting UCEs from genomes" tutorial. I subset by data type and perform lastz_sqlite step. After running the script, I get an interesting error that causes the search to fail. Of the 95 ahe data, 10 fail to get processed due to this error:

[38;21m2023-03-21 18:15:24,642 - phyluce_probe_run_multiple_lastzs_sqlite - INFO - Cleaning up the chunked files...[0m
[38;21m2023-03-21 18:15:24,662 - phyluce_probe_run_multiple_lastzs_sqlite - INFO - Cleaning the LASTZ output for SRR12339133[0m
[38;21m2023-03-21 18:15:24,669 - phyluce_probe_run_multiple_lastzs_sqlite - INFO - Creating the SRR12339133 table[0m
[38;21m2023-03-21 18:15:24,672 - phyluce_probe_run_multiple_lastzs_sqlite - INFO - Inserting data to the SRR12339133 table[0m
[38;21m2023-03-21 18:15:24,744 - phyluce_probe_run_multiple_lastzs_sqlite - INFO - Aligning against SRR12339134 scaffolds[0m
[38;21m2023-03-21 18:15:27,086 - phyluce_probe_run_multiple_lastzs_sqlite - INFO - Running against SRR12339134.2bit[0m
[38;21m2023-03-21 18:15:27,090 - phyluce_probe_run_multiple_lastzs_sqlite - INFO - Running with the --huge option.  Chunking files into 10000000 bp...[0m
[38;21m2023-03-21 18:15:31,160 - phyluce_probe_run_multiple_lastzs_sqlite - INFO - Running the targets against 4 queries...[0m
Traceback (most recent call last):
  File "/home/cody/.conda/envs/phyluce-1.7.1/bin/phyluce_probe_run_multiple_lastzs_sqlite", line 297, in <module>
    main()
  File "/home/cody/.conda/envs/phyluce-1.7.1/bin/phyluce_probe_run_multiple_lastzs_sqlite", line 286, in main
    align_against_scaffolds(log, cur, args, path)
  File "/home/cody/.conda/envs/phyluce-1.7.1/bin/phyluce_probe_run_multiple_lastzs_sqlite", line 219, in align_against_scaffolds
    args.identity,
  File "/home/cody/.conda/envs/phyluce-1.7.1/lib/python3.6/site-packages/phyluce/many_lastz.py", line 142, in multi_lastz_runner
    pool = multiprocessing.Pool(cores)
  File "/home/cody/.conda/envs/phyluce-1.7.1/lib/python3.6/multiprocessing/context.py", line 119, in Pool
    context=self.get_context())
  File "/home/cody/.conda/envs/phyluce-1.7.1/lib/python3.6/multiprocessing/pool.py", line 174, in __init__
    self._repopulate_pool()
  File "/home/cody/.conda/envs/phyluce-1.7.1/lib/python3.6/multiprocessing/pool.py", line 239, in _repopulate_pool
    w.start()
  File "/home/cody/.conda/envs/phyluce-1.7.1/lib/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/home/cody/.conda/envs/phyluce-1.7.1/lib/python3.6/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/home/cody/.conda/envs/phyluce-1.7.1/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/home/cody/.conda/envs/phyluce-1.7.1/lib/python3.6/multiprocessing/popen_fork.py", line 65, in _launch
    parent_r, child_w = os.pipe()
OSError: [Errno 24] Too many open files

I have further subset the ahe data into smaller sets and it is currently running fine, but I thought you might want to know that this is an issue.

I'm not sure what other information you might need, but I am happy to provide more detail if you would like.

brantfaircloth commented 1 year ago

This is not really a bug of Phyluce - it's an operating system limitation. I'm not sure what operating system you are using, but, generally, you can adjust the open file limit temporarily using ulimit. You can use ulimit -a to show various OS limits. And you can adjust the # of open files allowed using ulimit -n followed by the number of file descriptors you want to allow.

For example, on my mac:

ulimit -a
-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             8176
-c: core file size (blocks)         0
-v: address space (kbytes)          unlimited
-l: locked-in-memory size (kbytes)  unlimited
-u: processes                       10666
-n: file descriptors                256

And if I run ulimit -n 512 followed by ulimit -a, the number of file descriptors allowed changes to 512.

-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             8176
-c: core file size (blocks)         0
-v: address space (kbytes)          unlimited
-l: locked-in-memory size (kbytes)  unlimited
-u: processes                       10666
-n: file descriptors                512

crcardenas commented 1 year ago

Thanks for the quick response and your time. I am on a private linux server (Ubuntu 16.04.4 LTS, GNU/Linux 4.4.0-119-generic x86_64).

ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 2063412
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 2063412
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Looking at the upper limit shows this is as high as I can take it:

$ulimit -Sn
1024
$ ulimit -Hn
1024

I expect it depends on the data, but is there an expected upper limit with 95 taxa? Or, if I need to do this again, should I be subset the data when using phyluce_probe_run_multiple_lastzs_sqlite?

brantfaircloth commented 1 year ago

I've set mine to 4096 and rarely have an issue. I usually also process smaller batches of taxa at one time (e.g. 4 batches of 24 in the case of 96 taxa).

You can also change ulimit permanently by OS - check google for your OS. Also see here for some discussion.

crcardenas commented 1 year ago

Thank you so much for your help!

brantfaircloth commented 1 year ago

you're welcome 👍. good luck w/ your research!

faircloth-lab / phyluce

phyluce_probe_run_multiple_lastzs_sqlite error/bug #298