balabanmetin / apples

distance based phylogenetic placement
GNU General Public License v3.0
24 stars 5 forks source link

issues with jplace output #15

Open lpipes opened 8 months ago

lpipes commented 8 months ago

Hi, I'm trying to convert the jplace table but it seems like my jplace file has no entries for multiclass. Can you explain why this is? I was assuming that the jplace files are the same as the output for pplacer. I attached the test.jplace which I had to rename to test.txt. Thanks

> library(BoSSA)
> sql<-read_sqlite("test.db",jplace_file=gsub("sqlite","jplace","test.jplace"))
> sql
pplace object
run: 1
call run 1: guppy classify -c test.refpkg/ test.jplace --sqlite test.db
Placement on a phylogenetic tree with 1456 tips and 871 internal nodes.
sequence nb: 1805
placement nb: 1805
> table<-pplace_to_table(sql,type="best")
> table
NULL

test.txt

balabanmetin commented 8 months ago

thanks for using APPLES-2.

APPLES-2 outputs a place file in the format described here. Many tools including guppy and gappa can read the APPLES-2 jplace file. guppy s classify module must have extra requirements that are not mentioned in the original jplace format. My suspicion is that inconsistencies between the taxa names (underscores, quotation marks, etc.) in test.jplace and refpkg you use in classification is the reason. I would place a single sequence using both pplacer and APPLES-2 and look at the differences in the jplace file to find the error.

lpipes commented 8 months ago

The 2 jplace files look very different. From APPLES:

    "placements": [
        {
            "n": [
                "AY666199.1_151"
            ],
            "p": [
                [
                    615,
                    23.17112997861102,
                    1,
                    0.0236617022326274,
                    0.048165539617111265
                ]
            ]
        }
    ]

and then from pplacer:

  "placements":
  [
    {"p":
      [
        ["161681", 0.0337533286028, 1591, 0.438682310596, -7990.27227968,
          0.0505814804718
        ],
        ["8917", 0.0163040309998, 952, 0.166632640315, -7991.24026354,
          0.0523737200202
        ],
        ["161681", 0.000589067307439, 1593, 0.0940030574849, -7991.81272786,
          0.0614246190462
        ],
        ["161677", 5e-07, 953, 0.093953452743, -7991.81325569,
          0.0616765683895
        ],
        ["8917", 0.00712812213867, 1592, 0.0939127701877, -7991.81368879,
          0.0616753354433
        ],
        ["161677", 5e-07, 1396, 0.0609206694948, -7992.24648265,
          0.0652258111253
        ],
        ["190658", 0.0212321684278, 1137, 0.051895099178, -7992.40683081,
          0.0534156384798
        ]
      ], "nm": [["AY666199.1_151", 1]]
    }
  ]

I attached all of the files used to run both commands files.tar.gz

balabanmetin commented 8 months ago

The problem may be stemming from the fact that APPLES-2 outputs a single name "n" and placer outputs "nm" namelist. The other difference here is that pplacer output has classification ("161681") and APPLES-2 doesn't. But that what you were trying in the snippet you shared in the beginning of the thread: guppy classify -c test.refpkg/ test.jplace --sqlite test.db). I would try to change "n" to "nm" and retry classifying using guppy.

I just had a thought: another problem is that the "tree" in the APPLES-2 place file is not identical to the input file since APPLES-2 re-estimated input tree branch lengths. That might be conflicting with the tree file inside the test.refpkg.

lpipes commented 8 months ago

If I change n to nm in the test2.jplace file, I get this error:

guppy classify -c test.refpkg/ test2.jplace --sqlite test.db 
guppy: loadlocale.c:129: _nl_intern_locale_data: Assertion `cnt < (sizeof (_nl_value_type_LC_TIME) / sizeof (_nl_value_type_LC_TIME[0]))' failed.
Aborted (core dumped)

If I change the nm back to n and run the same command, I get no error.

lpipes commented 8 months ago

If it is the case that the tree in the APPLES-2 place file is not identical to the one in test.refpkg, how do I extract the tree easily from the jplace file?

lpipes commented 8 months ago

I tried to build the taxit refpkg using the tree that was output by apples and now I am getting this error (it runs if using the RAxML tree)

taxit create -P apples.refpkg -l COI --aln-fasta 79_MSA.fasta --taxonomy 79_taxonomyfromtaxids.csv --seq-info 79_seqInfo.csv --tree-file apples.tre
rppr: loadlocale.c:129: _nl_intern_locale_data: Assertion `cnt < (sizeof (_nl_value_type_LC_TIME) / sizeof (_nl_value_type_LC_TIME[0]))' failed.
Traceback (most recent call last):
  File "/space/s1/lenore/software/taxtastic_2/taxtastic/taxit.py", line 22, in <module>
    sys.exit(main(sys.argv[1:]))
  File "/space/s1/lenore/software/taxtastic_2/taxtastic/taxtastic/scripts/taxit.py", line 51, in main
    return action(arguments)
  File "/space/s1/lenore/software/taxtastic_2/taxtastic/taxtastic/subcommands/create.py", line 168, in action
    r.reroot(rppr=args.rppr)
  File "/home/lenore/.local/lib/python3.10/site-packages/decorator-5.1.1-py3.10.egg/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/space/s1/lenore/software/taxtastic_2/taxtastic/taxtastic/refpkg.py", line 132, in transaction
    return f(self, *args, **kwargs)
  File "/space/s1/lenore/software/taxtastic_2/taxtastic/taxtastic/refpkg.py", line 497, in reroot
    subprocess.check_call([rppr or 'rppr', 'reroot',
  File "/home/lenore/Python-3.10.3/Lib/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['rppr', 'reroot', '-c', '/space/s1/lenore/tronko_revisions/APPLES/apples.refpkg', '-o', '/tmp/treet9rp6nni.tre']' died with <Signals.SIGABRT: 6>

Ultimately, I'm just trying to get the taxonomy associated with the apples placement in the tree. I don't see any way to do this in the tutorial and the way I do this for pplacer output doesn't work.

lpipes commented 8 months ago

I think the reason why I can't make the refpkg is because the APPLES-2 tree is not binary:

> apples <- read.tree("apples.tre")
> apples

Phylogenetic tree with 1456 tips and 871 internal nodes.

Tip labels:
  JX160001.1, KM001306.1, KM001305.1, KM001304.1, DQ433124.1, AY666202.1, ...

Rooted; includes branch lengths.
> is.binary(apples)
[1] FALSE

I made the tree binary and ran taxit and that worked to make the refpkg but then failed again when running guppy classify.

lpipes commented 8 months ago

I disabled the tree re-estimation and I still cannot get the taxids associated with the placement:

> sql<-read_sqlite("test.db",jplace_file=gsub("sqlite","jplace","test3.jplace"))
> sql
pplace object
run: 3
call run 1: guppy classify -c test.refpkg/ test3.jplace --sqlite test2.db
Placement on a phylogenetic tree with 1456 tips and 1455 internal nodes.
sequence nb: 1807
placement nb: 1807
> table<-pplace_to_table(sql,type="best")
> table
NULL
> sql$multiclass
[1] placement_id name         want_rank    rank         tax_id      
[6] likelihood  
<0 rows> (or 0-length row.names)

The problem is that there is no multiclass associated with the placement

lpipes commented 8 months ago

I also tried to use gappa and it is complaining about the tree not being binary:

Found 1 jplace file
Error: Supplied tree is not bifurcating.

terminate called after throwing an instance of 'std::runtime_error'
  what():  Supplied tree is not bifurcating.
Aborted (core dumped)