TheSEED / RASTtk-Distribution

KBase distribution module for RASTtk.
http://www.nature.com/articles/srep08365
13 stars 6 forks source link

RASTtk & Patric CLI implementation #17

Closed WallyL closed 2 years ago

WallyL commented 2 years ago

Hi Bob,

I've used RASTtk for many years to annotate bact. genomes I've assembled, and you've helped me a few times with some issues.

Recently, I installed the .deb file on my local machine (an older Mac Pro running Ubuntu 20.04 LTS). The install went okay, but I couldn't get it to run... I now see it's no longer supported.

The last time we communicated, you suggested that I migrate to Patric CLI, as you've suggested here in the previous post, so I did that (Installed 1.039 after first trying 1.035). When I installed the .deb, there was some kind of conflict with the RASTtk /bin dirs, so I removed the RASTtk install (apt-get remove rasttk) and then the Patric install worked. p3-echo -t antibiotic penicillin | p3-get-drug-genomes --eq "genome_name,Streptococcus pneumoniae" --resistant --attr genome_id --attr genome_name >resist.tbl

Sooo, now I'm logged into Patric CLI and I ran some of the gen'l test queries found here (https://docs.patricbrc.org/cli_tutorial/cli_getting_started.html), which worked fine, e.g.
p3-echo -t antibiotic penicillin | p3-get-drug-genomes --eq "genome_name,Streptococcus pneumoniae" --resistant --attr genome_id --attr genome_name >resist.tbl

However, when I try to create my own .gto file using p3-rast, e.g. p3-rast 2823110 "Azo test genome" <Azo_013.fa >Azo.gto 2>test.log I get the following in the log file: $ cat test.log User not logged in: No such file or directory at /usr/share/patric-cli/deployment/lib/RASTlib.pm line 524.

But, I am logged in and the file is there: $ ll /usr/share/patric-cli/deployment/lib/ | grep RASTlib -rw-rw-r-- 1 wlorenz wlorenz 14278 Jun 9 2021 RASTlib.pm

Any ideas on what I need to do to remedy this?

Lastly, I do realize this isn't the correct forum for the last part of this question, but the Patric forum doesn't seem to be monitored and hasn't had a question answered since Nov. 2020. Feel free to move the thread there, if you like.

Best, Walt

olsonanl commented 2 years ago

I'm not sure what p3-rast is trying to do wrt authentication (it was written by another team member as a different wrapper to the PATRIC / BV-BRC annotation service). Were you using p3-rast previously? The primary entrypoint for CLI-based annotation in the BV-BRC environment is p3-submit-genome-annotation.

I have a question in to the developer of p3-rast about the authentication problem.

--bob

WallyL commented 2 years ago

Thanks, Bob.

Actually, I would much prefer to use the old rasttk, for which I had a pipeline working and was ery familiar with. I just uninstalled Patric and and reinstalled rasttk-v1.3.0.deb so I could get the error again.

It seems that your paths are hard-coded... $ rast-create-genome --scientific-name "Azospira species" --genetic-code 11 --domain Bacteria --contigs XX.fa > XX.gto

/usr/bin/rast-create-genome: 6: /home/olson/KB/runtime//bin/perl: not found

$ cat /usr/bin/rast-create-genome

!/bin/sh

export KB_TOP=/usr/share/rasttk/deployment export KB_RUNTIME=/home/olson/KB/runtime/ export PATH=/home/olson/KB/runtime//bin:/usr/share/rasttk/deployment/bin:$PATH export PERL5LIB=/usr/share/rasttk/deployment/lib /home/olson/KB/runtime//bin/perl /usr/share/rasttk/deployment/plbin/rast-create-genome.pl "$@"

I guess there may be other instances of this in rast- cmds that would need to be modified, or do you think this is the only one, i.e. can this be made to work?

olsonanl commented 2 years ago

The rasttk scripts should be working; at least on my mac the version from latest release 1.038 does:

$ rast-create-genome --scientific-name "Azospira species" --genetic-code 11 --domain Bacteria --contigs XX.fa > XX.gto
$ rast-process-genome < XX.gto  > XX.anno
$ head XX.anno
{
   "genetic_code" : 11,
   "analysis_events" : [
      {
         "hostname" : "pear",
         "execute_time" : 1650990193.47283,
         "parameters" : [
            "coverage",
            0.7,
            "descript",

I just tried the PATRIC release 1.035 on an ubuntu 18 live machine and it appears the scripts work properly (I don't have a fasta handy on it but the basic infrastructure appears to be working. Have you tried this version on your 20.04 system?

When you say old rasttk do you mean p3-rast or the individual rast-* scripts?

The rast-process-genome-batch service is no longer available since it relied upon a workflow engine we no longer support in our environment (we have moved over to a production scheduler which is what supports the BV-BRC annotation service on the cluster).

WallyL commented 2 years ago

I'm running xubuntu 20.04. I tried the old rasttk first and got the runtime/perl lib error. I uninstalled that and installed PATRIC release 1.035 and, in the example I gave above using p3-rast, it errors out immediately and gives the log error below. (Other p3- scripts appear to work okay)

$ cat test.log User not logged in: No such file or directory at /usr/share/patric-cli/deployment/lib/RASTlib.pm line 524

Then, I uninstalled 1.035 and installed 1.039, but I got the same error...

olsonanl commented 2 years ago

When you said you wanted to use old rasttk, do you mean the rast-* scripts? Those should be working properly in the patric release. Or have you been using p3-rast?

WallyL commented 2 years ago

I wanted to use the rast-* scripts, e.g.

rast-create-genome --scientific-name "Azospira species" --genetic-code 11 --domain Bacteria --contigs XX.fa > XX.gto rast-process-genome < XX.gto > XX.gto2 rast-annotate-proteins-similarity -H < XX.gto2 > XX.gto5 rast-annotate-special-proteins < XX.gto5 > XX.gto6 rast-annotate-families-patric < XX.gto6 > XX.gto7 rast-call-features-prophage-phispy < XX.gto7 > XX.gto8 rast-export-genome genbank_merged XX_merged.gbk rast-export-genome gff < XX.gto8 > XX.gff3

So, just to be clear, these should work with PATRIC 1.035/.039? I didn't try it since I didn't see that syntax mentioned anywhere in the documentation "Using RAST to Create New Genomes," which is why I was trying to make my .gto file with p3-rast syntax.

UPDATE: Voila! I made the gto with ver 1.035 and rast-create-genome.
So why isn't this syntax listed in the PATRIC doc. for creating a genome gto? Much clearer and simpler, IMO.

olsonanl commented 2 years ago

Yes - we support the original RASTtk scripts in the current distribution. We should pull in the rasttk stuff; it is not pushed now because for large volumes of genomes it's much more scalable to use the submit scripts (they have access to the full cluster where we can support many thousands of genomes per day). The synchronous service only has limited resources allocated to it.

BTW, these steps are included in the default pipeline so you don't need to explicitly run them:

rast-annotate-proteins-similarity -H < XX.gto2 > XX.gto5 rast-annotate-special-proteins < XX.gto5 > XX.gto6 rast-annotate-families-patric < XX.gto6 > XX.gto7

This is the full default workflow:

{
   "stages" : [
      {
         "name" : "call_features_rRNA_SEED"
      },
      {
         "name" : "call_features_tRNA_trnascan"
      },
      {
         "repeat_region_SEED_parameters" : {},
         "name" : "call_features_repeat_region_SEED"
      },
      {
         "failure_is_not_fatal" : 1,
         "name" : "call_selenoproteins"
      },
      {
         "name" : "call_pyrrolysoproteins",
         "failure_is_not_fatal" : 1
      },
      {
         "name" : "call_features_strep_suis_repeat",
         "condition" : "$genome->{scientific_name} =~ /^Streptococcus\\s/"
      },
      {
         "condition" : "$genome->{scientific_name} =~ /^Streptococcus\\s/",
         "name" : "call_features_strep_pneumo_repeat"
      },
      {
         "name" : "call_features_crispr",
         "failure_is_not_fatal" : 1
      },
      {
         "name" : "call_features_CDS_prodigal"
      },
      {
         "name" : "call_features_CDS_glimmer3",
         "glimmer3_parameters" : {},
         "failure_is_not_fatal" : 1
      },
      {
         "prune_invalid_CDS_features_parameters" : {},
         "name" : "prune_invalid_CDS_features"
      },
      {
         "kmer_v2_parameters" : {},
         "name" : "annotate_proteins_kmer_v2"
      },
      {
         "name" : "annotate_proteins_kmer_v1",
         "failure_is_not_fatal" : 1,
         "kmer_v1_parameters" : {
            "annotate_null_only" : 1
         }
      },
      {
         "name" : "annotate_proteins_phage",
         "phage_parameters" : {
            "annotate_null_only" : 1
         }
      },
      {
         "name" : "annotate_proteins_similarity",
         "similarity_parameters" : {
            "annotate_null_only" : 1
         }
      },
      {
         "name" : "propagate_genbank_feature_metadata",
         "propagate_genbank_feature_metadata_parameters" : {}
      },
      {
         "resolve_overlapping_features_parameters" : {},
         "name" : "resolve_overlapping_features"
      },
      {
         "condition" : "scalar @{$genome->{contigs}} != grep { $_->{replicon_type} eq \"plasmid\" } @{$genome->{contigs}}",
         "name" : "classify_amr",
         "failure_is_not_fatal" : 1
      },
      {
         "name" : "renumber_features"
      },
      {
         "failure_is_not_fatal" : 1,
         "name" : "annotate_special_proteins"
      },
      {
         "name" : "annotate_families_figfam_v1",
         "failure_is_not_fatal" : 1
      },
      {
         "name" : "annotate_families_patric",
         "failure_is_not_fatal" : 1
      },
      {
         "name" : "annotate_null_to_hypothetical"
      },
      {
         "name" : "find_close_neighbors",
         "failure_is_not_fatal" : 1
      },
      {
         "failure_is_not_fatal" : 1,
         "name" : "annotate_strain_type_MLST"
      },
      {
         "name" : "compute_genome_quality_control",
         "failure_is_not_fatal" : 1
      },
      {
         "failure_is_not_fatal" : 1,
         "name" : "evaluate_genome",
         "evaluate_genome_parameters" : {}
      }
   ]
}

If you wish you can provide a nonstandard workflow to rast-process-genome via the --workflow parameter. There you could add the other nonstandard stages (and remove any you don't need).

--bob

WallyL commented 2 years ago

Thanks for the explanation, good to know! I'm not doing this as much as I use to. Typically I would have no more than ca. 10 genomes to annotate at any given time.

So, in the default workflow shown above, that information would all be stored in gto2, correct? rast-process-genome < XX.gto > XX.gto2

So the full query I used before:

_rast-create-genome --scientific-name "Azospira species" --genetic-code 11 --domain Bacteria --contigs XX.fa > XX.gto rast-process-genome < XX.gto > XX.gto2 rast-annotate-proteins-kmer-v2 < XX.gto2 > XX.gto3 rast-annotate-proteins-kmer-v1 -H < XX.gto3 > XX.gto4 rast-annotate-proteins-similarity -H < XX.gto4 > XX.gto5 rast-annotate-special-proteins < XX.gto5 > XX.gto6 rast-annotate-families-patric < XX.gto6 > XX.gto7 rast-call-features-prophage-phispy < XX.gto7 > XX.gto8 rast-export-genome genbank < XX.gto8 > XX.gbk rast-export-genome genbank_merged XX_merged.gbk rast-export-genome gff < XX.gto8 > XX.gff3 rast-export-genome feature_data XX.features.txt rast-export-genome spreadsheet_txt < XX.gto8 > XX.txt rast-export-genome spreadsheet_xls < XX.gto8 > XX.xls rast-export-genome protein_fasta XX.pep.fa rast-export-genome patric_specialty_genes XX.patric_specialty.txt rast-export-genome feature_dna XX.feature.fa rast-export-genome patric_features XX.patricfeatures.txt

Would become this, followed by my rast-exports from gto3?

rast-create-genome --scientific-name "Azospira species" --genetic-code 11 --domain Bacteria --contigs XX.fa > XX.gto rast-process-genome < XX.gto > XX.gto2 rast-call-features-prophage-phispy < XX.gto2 > XX.gto3

olsonanl commented 2 years ago

Yes, that looks correct.

WallyL commented 2 years ago

Great!
Thanks for your help, Bob. Glad to have this up and running again. Much appreciated!

AnaValero commented 2 years ago

Hi,

I have the exact same issue with unrecognized token:

"User not logged in: No such file or directory at /usr/share/patric-cli/deployment/lib/RASTlib.pm line 524."

It happens when running the command: p3-rast 10736 "Bacillus phage phi3T" < scaffolds.fasta > output.gto 2>test.log

I am logged using an account with domain @patricbrc.org, confirmed by p3-login and p3-whoami, and the RASTlib.pm is present and working (when I edit it, the error changes).

I guess the problem is about the $passfile, but no clue.

PATRIC-CLI version is patric-cli-1.035.deb, installed today, on Ubuntu v.18.

Thanks for your help, Ana

olsonanl commented 2 years ago

Please try updating to the latest at https://github.com/PATRIC3/PATRIC-distribution/releases/tag/1.039. There were SSL changes in our environment that required changes after 1.035 for the other platforms; I don't recall if Ubuntu was affected but that would be the first thing to try.

Bob