AnantharamanLab / vClassifier

Species-level taxonomic classification of viruses
GNU General Public License v3.0
2 stars 0 forks source link

pplacer/NewickError #1

Open asierFernandezP opened 5 months ago

asierFernandezP commented 5 months ago

Hi!

Thanks a lot for developing this useful tool! I am currently trying to run it on a set of ~2,000 viral genomes.
I followed the GitHub instructions and:

However, I am now getting the following error:

Strategy: FFT-NS-i (Standard) Iterative refinement method (max. 2 iterations)

If unsure which option to use, try 'mafft --auto input > output'. For more information, see 'mafft --help', 'mafft --man' and the mafft page.

The default gap scoring scheme has been changed in version 7.110 (2013 Oct). It tends to insert more gaps into gap-rich regions than previous versions. To disable this change, add the --leavegappyregion option.

rm: cannot remove '../query_viral_genomes_protein.faa.tmp': No such file or directory Thu Jun 20 22:54:55 CEST 2024 Replace genomes in reference trees Running pplacer v1.1.alpha19-0-g807f6f3 analysis on Ackermannviridae_ReferenceQuery_aln.fasta... Found reference sequences in given alignment file. Using those for reference alignment. Pre-masking sequences... sequence length cut from 48888 to 0. Sequence length cut to 0 by pre-masking; can't proceed with no information. guppy: loadlocale.c:129: _nl_intern_locale_data: Assertion `cnt < (sizeof (_nl_value_type_LC_TIME) / sizeof (_nl_value_type_LC_TIME[0]))' failed. /scratch/hb-llnext/conda_envs/vClassifier/vClassifier_family: line 314: 2631870 Aborted (core dumped) guppy tog -o "$line"_ReferenceQuery.jplace.treefile "$line"_ReferenceQuery.jplace Traceback (most recent call last): File "/scratch/hb-llnext/conda_envs/vClassifier/scripts/identify_monophyletic_groups.py", line 12, in tree = Tree(args.tree_file) ^^^^^^^^^^^^^^^^^^^^ File "/home2/p304845/.local/lib/python3.11/site-packages/ete3/coretype/tree.py", line 212, in init read_newick(newick, root_node = self, format=format, File "/home2/p304845/.local/lib/python3.11/site-packages/ete3/parser/newick.py", line 264, in read_newick raise NewickError('Unexisting tree file or Malformed newick tree structure.') ete3.parser.newick.NewickError: Unexisting tree file or Malformed newick tree structure. You may want to check other newick loading flags like 'format' or 'quoted_node_names'.

KunUW commented 4 weeks ago

Hi,

Apologize for the delay in our response. It seems that there was an issue with MAFFT and other scripts. ​We have updated the scripts for vClassifier, and it is now performing well.​ Could you please try the updated version? If you encounter any further issues, please do not hesitate to let us know. Thanks.

asierFernandezP commented 4 weeks ago

Thanks for your answer!

I reinstalled the environment and tried running it again but I still encounter a similar problem:

====================================================================================================
vie oct 25 07:55:21 PDT 2024    Step 1: Gene calling and VOG annotation
vie oct 25 08:18:22 PDT 2024    Step 2: Identification of single-copy markers
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
readline() on closed filehandle IN2 at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/rawSeqID2queries.pl line 22.
vie oct 25 08:18:26 PDT 2024    Step 3: Genome replacement in reference trees
mv: cannot stat '*_ReferenceQuery_aln.fasta': No such file or directory
vie oct 25 08:18:26 PDT 2024    Preprocessing before classification for viruses in *
pplacer: loadlocale.c:129: _nl_intern_locale_data: Assertion `cnt < (sizeof (_nl_value_type_LC_TIME) / sizeof (_nl_value_type_LC_TIME[0]))' failed.
/clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/vClassifier_family.sh: line 132: 2138183 Aborted                 (core dumped) pplacer --verbosity 0 -c $installer_dir/database/packages_for_pplacer/"$line".refpkg "$line"_ReferenceQuery_aln.fasta -o "$line"_ReferenceQuery.jplace
guppy: loadlocale.c:129: _nl_intern_locale_data: Assertion `cnt < (sizeof (_nl_value_type_LC_TIME) / sizeof (_nl_value_type_LC_TIME[0]))' failed.
/clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/vClassifier_family.sh: line 132: 2138186 Aborted                 (core dumped) guppy tog -o "$line"_ReferenceQuery.jplace.treefile "$line"_ReferenceQuery.jplace
Traceback (most recent call last):
  File "/clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/identify_monophyletic_groups.py", line 12, in <module>
    tree = Tree(args.tree_file)
  File "/clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/lib/python3.6/site-packages/ete3/coretype/tree.py", line 211, in __init__
    quoted_names=quoted_node_names)
  File "/clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/lib/python3.6/site-packages/ete3/parser/newick.py", line 249, in read_newick
    raise NewickError('Unexisting tree file or Malformed newick tree structure.')
ete3.parser.newick.NewickError: Unexisting tree file or Malformed newick tree structure.
You may want to check other newick loading flags like 'format' or 'quoted_node_names'.
vie oct 25 08:23:09 PDT 2024    Step 4: Classification at subfamily rank
vie oct 25 08:23:09 PDT 2024    Step 5: Classification at genus rank
vie oct 25 08:23:09 PDT 2024    Step 6: Classification at species rank
cat: '*_fastani_output_species_classification2.besthit': No such file or directory
cat: '*monophyletic_groups_with_seqID_for_subfamily_assignment_output': No such file or directory
cat: '*monophyletic_groups_with_seqID_for_genus_assignment_output': No such file or directory
Use of uninitialized value $col2[0] in hash element at /clusterfs/jgi/scratch/science/metagen/afernandezpato/Tools/vClassifier/vClassifier/scripts/queries2rawSeqID.pl line 24, <IN2> line 1.
vie oct 25 08:25:32 PDT 2024    Step 7: Final lineage assignment
vie oct 25 08:25:32 PDT 2024    Assignment finished
====================================================================================================
====================================================================================================
KunUW commented 4 weeks ago

Have you successfully tested the example sequences? If so, it is likely that your own query genomes cannot be classified by vClassifier. This may be due to the fact that your query genomes fall outside the 36 families and 55 subfamilies identified in our paper. Alternatively, it could be that, although your genomes belong to these families or subfamilies, no single copy genes were detected. ​Nevertheless, we anticipate releasing a more robust version of vClassifier in the future, which will cover a wider range of families and subfamilies.

asierFernandezP commented 3 weeks ago

Thanks for the answer! These are mostly phages belonging to the Caudoviricetes class (identified from human gut samples). I will anyway wait for the final version :)