Closed kloetzl closed 7 years ago
Hi,
I'm trying to run andi on 1843 bacterial isolate genomes. I pass andi
the list of isolates for analysis from a file using a pipe and xargs:
cat filenames.txt | xargs andi -j -m JC -t 64 > andi_JCdist.mat
filenames.txt
contains a space-delimited list of filepaths. The total length of the string is filenames.txt
is 164389 characters.
The analysis runs without errors. But the output matrix contains two phylip matrices. The first matrix is 1468x1468 and the second is 375x375.
I would expect that if the second matrix was a continuation of the first, they should both have 1843 columns. It appears that the analysis has been broken down into two blocks, so an all vs all comparison for the 1843 isolates has not been performed. Only an all vs all comparison within each of the blocks has been performed. That is, I was expecting 1843^2 cell values, but only got 1468^2+375^2 values.
Thanks,
Mark
Hi Mark,
Thanks for your interest in andi
. The problem you are facing is of a different nature, though. From the xargs
man page:
The command line for command is built up until it reaches a system-defined limit (…). The specified command will be invoked as many times as necessary to use up the list of input items.
So the command you are trying to build either has too many files, or the path lengths are filling the shells buffer. You can check the limits for your system via xargs --show-limits
. For my system, the output is:
Your environment variables take up 1359 bytes POSIX upper limit on argument length (this system): 2093745 POSIX smallest allowable upper limit on argument length (all systems): 4096 Maximum length of command we could actually use: 2092386 Size of command buffer we are actually using: 131072 Maximum parallelism (--max-procs must be no greater): 2147483647
You might want to try increasing you systems limits. If that doesn't work, I could supply you with a custom version of andi
that can read filenames from a list.
Hope this helps, Fabian
ps. For future reference: Please open a new Issue to start a new thread of discussion.
Hi Fabian,
Thanks for the reply. Apologies for posting in an old issue.
My xargs
limits are the same as yours. In the user manual it states that andi
accepts filenames from stdin
but I could not get it to do that, which is why I used xargs
. Yes, please send me the version that accepts a file of file names. Will you also publicly release this version? That would be useful to a lot of people I’m sure.
Cheers,
Mark
The next problem you will face is that by default most processes are limited to 1024 open files; see ulimit -a
.
Does andi
open all the input files at once?
Or one at a time and close them as it goes?
Yes, please send me the version that accepts a file of file names. Will you also publicly release this version?
I am already working on it and it will get into the next official release.
Or one at a time and close them as it goes?
One after the other.
I pushed some commits that fix the problems in this issue. A preliminary version, supporting the new --file-of-filenames
parameter can be downloaded from here: https://kloetzl.info/downloads/andi-0.11-beta.tar.gz You will need to follow the instructions to install from “source package” in the manual. The “fof” file should contain exactly one path per line. Also, the last path needs to be followed by a line break. Otherwise, you will receive weird error messages. I will work on making the code more robust after lunch. :smiley:
Tip: If you run andi with the option --verbose
it will output the number of sequences it compares early on. That way you can abort, if the numbers don't match.
Thank you for your help @kloetzl
Let me know, if andi successfully completed the big run. Then I can close this issue.
I am closing this issue. Both problems should be fixed in the current master and thus in the upcoming release.
The PHYLIP format for distance matrices only allows identifiers of up to 9 (or 10?) characters. Unfortunately this means that sometimes names are cutoff, making them indistinguishable. A
--name-length
option might help.