apcamargo / genomad

geNomad: Identification of mobile genetic elements
https://portal.nersc.gov/genomad/
Other
169 stars 17 forks source link

Error while classifying sequences #105

Open LCFortier opened 2 weeks ago

LCFortier commented 2 weeks ago

Hi,

I have run geNomad and it seemed to run smoothly until it aborted due to uncaught exception (see below).

libc++abi: terminating due to uncaught exception of type Xbyak::Error: x2APIC is not supported

I have installed geNomad with Conda on a MacbookPro M1.

Any idea of what could have gone wrong?

Thanks!

apcamargo commented 2 weeks ago

At which point did it fail? Can you share the full log?

LCFortier commented 2 weeks ago

(genomad) forl1705@UN05FMSS508020 geNomad % genomad end-to-end --cleanup --splits 8 R20291_NC013316.fas R20291_genomad_output genomad_db

╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ Executing geNomad annotate (v1.8.0). This will perform gene calling in the input sequences and annotate the predicted proteins with geNomad's markers. │ │ ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │ │ Outputs: │ │ R20291_genomad_output/R20291_NC013316_annotate │ │ ├── R20291_NC013316_annotate.json (execution parameters) │ │ ├── R20291_NC013316_genes.tsv (gene annotation data) │ │ ├── R20291_NC013316_taxonomy.tsv (taxonomic assignment) │ │ ├── R20291_NC013316_mmseqs2.tsv (MMseqs2 output file) │ │ └── R20291_NC013316_proteins.faa (protein FASTA file) │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ [21:36:24] Executing genomad annotate.
[21:36:24] Creating the R20291_genomad_output/R20291_NC013316_annotate directory.
[21:36:33] Proteins predicted with pyrodigal-gv were written to R20291_NC013316_proteins.faa.
[21:38:37] Proteins annotated with MMseqs2 and geNomad database (v1.7) were written to R20291_NC013316_mmseqs2.tsv.
[21:38:37] Deleting R20291_NC013316_mmseqs2.
[21:38:38] Gene data was written to R20291_NC013316_genes.tsv.
[21:38:38] Taxonomic assignment data was written to R20291_NC013316_taxonomy.tsv.
[21:38:38] geNomad annotate finished!
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ Executing geNomad find-proviruses (v1.8.0). This will find putative proviral regions within the input sequences. │ │ ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │ │ Outputs: │ │ R20291_genomad_output/R20291_NC013316_find_proviruses │ │ ├── R20291_NC013316_find_proviruses.json (execution parameters) │ │ ├── R20291_NC013316_provirus.tsv (provirus data) │ │ ├── R20291_NC013316_provirus.fna (provirus nucleotide sequences) │ │ ├── R20291_NC013316_provirus_proteins.faa (provirus protein sequences) │ │ ├── R20291_NC013316_provirus_genes.tsv (provirus gene annotation data) │ │ ├── R20291_NC013316_provirus_taxonomy.tsv (provirus taxonomic assignment) │ │ ├── R20291_NC013316_provirus_mmseqs2.tsv (MMseqs2 output file) │ │ └── R20291_NC013316_provirus_aragorn.tsv (Aragorn output file) │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ [21:38:38] Executing genomad find-proviruses.
[21:38:38] Creating the R20291_genomad_output/R20291_NC013316_find_proviruses directory.
[21:38:40] Integrases identified with MMseqs2 and geNomad database (v1.7) were written to R20291_NC013316_provirus_mmseqs2.tsv.
[21:38:40] Deleting R20291_NC013316_provirus_mmseqs2.
[21:38:40] Deleting R20291_NC013316_provirus_mmseqs2_input.faa.
[21:38:47] tRNAs identified with Aragorn were written to R20291_NC013316_provirus_aragorn.tsv.
[21:38:47] Deleting R20291_NC013316_provirus_aragorn_input.fna.
[21:38:48] Provirus regions identified.
[21:38:48] Provirus data was written to R20291_NC013316_provirus.tsv.
[21:38:48] Provirus nucleotide sequences were written to R20291_NC013316_provirus.fna.
[21:38:48] Provirus protein sequences were written to R20291_NC013316_provirus_proteins.faa.
[21:38:48] Provirus gene data was written to R20291_NC013316_provirus_genes.tsv.
[21:38:48] Taxonomic assignment data was written to R20291_NC013316_provirus_taxonomy.tsv.
[21:38:48] geNomad find-proviruses finished!
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ Executing geNomad marker-classification (v1.8.0). This will classify the input sequences into chromosome, plasmid, or virus based on the presence of │ │ geNomad markers and other gene-related features. │ │ ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │ │ Outputs: │ │ R20291_genomad_output/R20291_NC013316_marker_classification │ │ ├── R20291_NC013316_marker_classification.json (execution parameters) │ │ ├── R20291_NC013316_features.tsv (sequence feature data: tabular format) │ │ ├── R20291_NC013316_features.npz (sequence feature data: binary format) │ │ ├── R20291_NC013316_marker_classification.tsv (sequence classification: tabular format) │ │ ├── R20291_NC013316_marker_classification.npz (sequence classification: binary format) │ │ ├── R20291_NC013316_provirus_features.tsv (provirus feature data: tabular format) │ │ ├── R20291_NC013316_provirus_features.npz (provirus feature data: binary format) │ │ ├── R20291_NC013316_provirus_marker_classification.tsv (provirus classification: tabular format) │ │ └── R20291_NC013316_provirus_marker_classification.npz (provirus classification: binary format) │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ [21:38:48] Executing genomad marker-classification.
[21:38:48] Creating the R20291_genomad_output/R20291_NC013316_marker_classification directory.
[21:38:49] Sequence features computed.
[21:38:49] Sequence features in binary format written to R20291_NC013316_features.npz.
[21:38:49] Sequence features in tabular format written to R20291_NC013316_features.tsv.
[21:38:49] Provirus features computed.
[21:38:49] Provirus features in binary format written to R20291_NC013316_provirus_features.npz.
[21:38:49] Provirus features in tabular format written to R20291_NC013316_provirus_features.tsv.
[21:38:49] Sequences classified.
[21:38:49] Sequence classification in binary format written to R20291_NC013316_marker_classification.npz.
[21:38:49] Sequence classification in tabular format written to R20291_NC013316_marker_classification.tsv.
[21:38:49] Proviruses classified.
[21:38:49] Provirus classification in binary format written to R20291_NC013316_provirus_marker_classification.npz.
[21:38:49] Provirus classification in tabular format written to R20291_NC013316_provirus_marker_classification.tsv.
[21:38:49] geNomad marker-classification finished!
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ Executing geNomad nn-classification (v1.8.0). This will classify the input sequences into chromosome, plasmid, or virus based on the nucleotide sequence. │ │ ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │ │ Outputs: │ │ R20291_genomad_output/R20291_NC013316_nn_classification │ │ ├── R20291_NC013316_nn_classification.json (execution parameters) │ │ ├── R20291_NC013316_encoded_sequences (directory containing encoded sequence data) │ │ ├── R20291_NC013316_nn_classification.tsv (contig classification: tabular format) │ │ ├── R20291_NC013316_nn_classification.npz (contig classification: binary format) │ │ ├── R20291_NC013316_encoded_proviruses (directory containing encoded sequence data) │ │ ├── R20291_NC013316_provirus_nn_classification.tsv (provirus classification: tabular format) │ │ └── R20291_NC013316_provirus_nn_classification.npz (provirus classification: binary format) │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ [21:40:07] Executing genomad nn-classification.
[21:40:07] Creating the R20291_genomad_output/R20291_NC013316_nn_classification directory.
[21:40:07] Creating the R20291_genomad_output/R20291_NC013316_nn_classification/R20291_NC013316_encoded_sequences directory.
[21:40:08] Encoded sequence data written to R20291_NC013316_encoded_sequences.
[21:40:08] Creating the R20291_genomad_output/R20291_NC013316_nn_classification/R20291_NC013316_encoded_proviruses directory.
[21:40:08] Encoded provirus data written to R20291_NC013316_encoded_proviruses.
Classifying sequences ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0% | -:--:--libc++abi: terminating due to uncaught exception of type Xbyak::Error: x2APIC is not supported zsh: abort genomad end-to-end --cleanup --splits 8 R20291_NC013316.fas genomad_db (genomad) forl1705@UN05FMSS508020 geNomad %

apcamargo commented 2 weeks ago

It seems that this failed on the neural network step, which could be because of a incompatibility between TensorFlow (the deep learning library geNomad uses) and the processor's architecture (see here). Do you know if you are running this natively or through Rosetta?

LCFortier commented 2 weeks ago

Looking at the link you sent, it is probably the same issue related to the M1 chip. I have installed miniforge3 with the arm64 achitecture. I suppose I have to follow the same steps as indicated in the thread you cited.

apcamargo commented 2 weeks ago

It seems that Conda (or Mamba) creates native environments by default and that the conda-forge version of TensorFlow is not really working on ARM chips. You may try:

Last case scenario, in case you are in a rush to get these results, you can disable the neural-network branch using the --disable-nn-classification parameter.

LCFortier commented 2 weeks ago

OK thanks for the suggestions, I am not in a rush so I'll try it tomorrow and will let you know if it worked. Thanks for your quick reply!