endixk / ezaai

EzAAI - High Throughput Prokaryotic AAI Calculator
http://leb.snu.ac.kr/ezaai
GNU General Public License v3.0
36 stars 4 forks source link

java.io.FileNotFoundException: ./output/Matrix (Not a directory) #28

Open shlomobl opened 3 days ago

shlomobl commented 3 days ago

Hello,

Thanks for this tool!

When running the following:

ezaai calculate -i ./DB -j ./DB -o ./output -mtx ./output/Matrix -t 32

And after a long wait over weekend (~230 genomes!)... I got this:

java.io.FileNotFoundException: ./output/Matrix (Not a directory)
    at java.io.FileOutputStream.open0(Native Method)
    at java.io.FileOutputStream.open(FileOutputStream.java:270)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:101)
    at java.io.FileWriter.<init>(FileWriter.java:63)
    at leb.main.EzAAI.runCalculate(EzAAI.java:548)
    at leb.main.EzAAI.run(EzAAI.java:689)
    at leb.main.EzAAI.main(EzAAI.java:725)

I did get an output file, which I could cluster using the clustering module. (in which btw would be nice to have an option to get only sample names as tree labels, instead of the file names)

But both the matching CDs folder and the Matrix are empty.

In this something I can retrieve from the output anyhow, without having to run the whole thing again?

Thanks!

endixk commented 1 day ago

Hello,

Sorry for the inconvenience, seems like my help messages were misleading.

The output of -o flag is not a directory but a file, so your attempt to write ./output/Matrix is expected to fail since a regular file named ./output has already been generated.

Regarding the outputs, -mtx output is simply a re-formatted version of the normal output, so every values you need should be already included in the produced in the ./output TSV file.

However, in case of matching CDs, you should provide a file path using the option -match in your calculate command, and unfortunately you have to start over with this flag given. It is technically possible to salvage the results from the intermediate files, if not deleted, but it requires not-so-simple coding or scripting.

I suggest running something like this command:

ezaai calculate -i ./DB -j ./DB -o ./output -mtx ./matrix -match ./match -t 32

Please note that -mtx and -match values are not dependent to ./output.

shlomobl commented 1 day ago

Thanks! Is there anyway to retrieve the n. of CDs used to calculate the AAI? Are these the core-CDs for the entire genome set or for each pair separately?

On Tue, Dec 3, 2024, 08:57 Dongwook Kim @.***> wrote:

Hello,

Sorry for the inconvenience, seems like the description was misleading.

The output of -o flag is a file not a directory but a file, so your attempt to write ./output/Matrix is expected to fail since a regular file named ./output has already been generated.

Regarding the outputs, -mtx output is simply a re-formatted version of the normal output, so every values you need should be already included in the produced in the ./output TSV file.

However, in case of matching CDs, you should provide a file path using the option -match in your calculate command, and unfortunately you have to start over with this flag given. It is technically possible to salvage the results from the intermediate files, if not deleted, but it requires not-so-simple coding or scripting.

I suggest running something like this command:

ezaai calculate -i ./DB -j ./DB -o ./output -mtx ./matrix -match ./match -t 32

Please note that -mtx and -match values are not dependent to ./output.

— Reply to this email directly, view it on GitHub https://github.com/endixk/ezaai/issues/28#issuecomment-2513703907, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKL3XICMKPCJR6AVOXTSUT2DVI4NAVCNFSM6AAAAABSZQA57WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMJTG4YDGOJQG4 . You are receiving this because you authored the thread.Message ID: @.***>

endixk commented 1 day ago

Every data is based on the pairwise comparison.

Number of CDs should be in the ./output result file, just try head ./output. # of CDS from both genomes, as well as # of CDS considered to calculate the AAI are written in the 6th, 7th, and 8th column, respectively.