KGerhardt / RecruitPlotEasy

A tool for interactive Recruitment Plot generation and viewing
4 stars 3 forks source link

fastAAI: sqlite3.OperationalError #4

Closed nick-youngblut closed 2 years ago

nick-youngblut commented 2 years ago

Since Issues are not available for FastAAI, I'm posting my issue here. I'll switch it over to the appropriate repo when it is possible.

Setup

Conda env setup as in the instructions.

Using prodigal-generated amino acid sequences.

fastAAI run:

fastaai aai_index   --threads 12     -p /path/to/prodigal/protein_files/ -o $OUTDIR

Error

Starting from proteins

FastAAI is formatting your files to be saved to your database.

Formatting data to add to database at 10/04/2022 07:48:04

Adding data to final database.

Database build complete!

Performing an all vs. all query on FastAAI/database/FastAAI_database.sqlite.db

Loading query data at 10/04/2022 07:48:05 ...

Loading target data at 10/04/2022 07:48:06 ...

FastAAI will search 9 query genomes against 9 target genomes.

Beginning AAI calculation at 10/04/2022 07:48:06multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/tmp/global2/nyoungblut/code/dev/ll_pipelines/llg/.snakemake/conda/e597b1bc4c3f6c65a46887160aeefc74/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/tmp/global2/nyoungblut/code/dev/ll_pipelines/llg/.snakemake/conda/e597b1bc4c3f6c65a46887160aeefc74/lib/python3.7/site-packages/fastaai/FastAAI.py", line 2270, in do_sql_query_no_SD
    database.cursor.execute(temp_tab)
sqlite3.OperationalError: near "-": syntax error
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/tmp/global2/nyoungblut/code/dev/ll_pipelines/llg/.snakemake/conda/e597b1bc4c3f6c65a46887160aeefc74/bin/fastaai", line 8, in <module>
    sys.exit(main())
  File "/tmp/global2/nyoungblut/code/dev/ll_pipelines/llg/.snakemake/conda/e597b1bc4c3f6c65a46887160aeefc74/lib/python3.7/site-packages/fastaai/FastAAI.py", line 3715, in main
    aai_index(genomes, proteins, hmms, db_name, output, threads, gf, pf, hf, verbose, do_stdev, mem, efficient)
  File "/tmp/global2/nyoungblut/code/dev/ll_pipelines/llg/.snakemake/conda/e597b1bc4c3f6c65a46887160aeefc74/lib/python3.7/site-packages/fastaai/FastAAI.py", line 3319, in aai_index
    db_query(accessible_name, accessible_name, verbose, output, threads, do_stdev, memory_use, unlimited_resources)
  File "/tmp/global2/nyoungblut/code/dev/ll_pipelines/llg/.snakemake/conda/e597b1bc4c3f6c65a46887160aeefc74/lib/python3.7/site-packages/fastaai/FastAAI.py", line 2442, in db_query
    do_query_vs_target_sql(query, target, threads, output, verbose, do_stdev)
  File "/tmp/global2/nyoungblut/code/dev/ll_pipelines/llg/.snakemake/conda/e597b1bc4c3f6c65a46887160aeefc74/lib/python3.7/site-packages/fastaai/FastAAI.py", line 2126, in do_query_vs_target_sql
    for file in pool.imap(do_sql_query_no_SD, query_args):
  File "/tmp/global2/nyoungblut/code/dev/ll_pipelines/llg/.snakemake/conda/e597b1bc4c3f6c65a46887160aeefc74/lib/python3.7/multiprocessing/pool.py", line 748, in next
    raise value
sqlite3.OperationalError: near "-": syntax error

The error is due to input file names that include a -. For example: Ruminococcus_bromii_strain_AM46-2B.faa

nick-youngblut commented 2 years ago

Error 2

FastAAI errors when using a directory path for -o (e.g., /path/to/output/directory/), instead of just an output directory name (e.g., fastaai_output):

$ fastaai aai_index --threads 8 -g genomes -o tmp/test_out
Starting from genomes

FastAAI is formatting your files to be saved to your database.

Error: can't open output file test_out/predicted_proteins/TAS386.temp.

Error: can't open output file test_out/predicted_proteins/TAS122.temp.

Error: can't open output file test_out/predicted_proteins/TAS067.temp.

Error: can't open output file test_out/predicted_proteins/TAS386.temp.

Error: can't open output file test_out/predicted_proteins/TAS122.temp.

Error: can't open output file test_out/predicted_proteins/TAS067.temp.

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/tmp/global2/nyoungblut/code/dev/ll_pipelines/llg/tmp/fastAAI/fastaai_env/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/tmp/global2/nyoungblut/code/dev/ll_pipelines/llg/tmp/fastAAI/fastaai_env/lib/python3.7/site-packages/fastaai/FastAAI.py", line 961, in do_advance
    input_file_object.preprocess()
  File "/tmp/global2/nyoungblut/code/dev/ll_pipelines/llg/tmp/fastAAI/fastaai_env/lib/python3.7/site-packages/fastaai/FastAAI.py", line 881, in preprocess
    self.genome_to_protein()
  File "/tmp/global2/nyoungblut/code/dev/ll_pipelines/llg/tmp/fastAAI/fastaai_env/lib/python3.7/site-packages/fastaai/FastAAI.py", line 673, in genome_to_protein
    temp_output.unlink()
  File "/tmp/global2/nyoungblut/code/dev/ll_pipelines/llg/tmp/fastAAI/fastaai_env/lib/python3.7/pathlib.py", line 1309, in unlink
    self._accessor.unlink(self)
FileNotFoundError: [Errno 2] No such file or directory: 'test_out/predicted_proteins/TAS067.temp'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/tmp/global2/nyoungblut/code/dev/ll_pipelines/llg/tmp/fastAAI/fastaai_env/bin/fastaai", line 8, in <module>
    sys.exit(main())
  File "/tmp/global2/nyoungblut/code/dev/ll_pipelines/llg/tmp/fastAAI/fastaai_env/lib/python3.7/site-packages/fastaai/FastAAI.py", line 3715, in main
    aai_index(genomes, proteins, hmms, db_name, output, threads, gf, pf, hf, verbose, do_stdev, mem, efficient)
  File "/tmp/global2/nyoungblut/code/dev/ll_pipelines/llg/tmp/fastAAI/fastaai_env/lib/python3.7/site-packages/fastaai/FastAAI.py", line 3316, in aai_index
    success = build_db(genomes, proteins, hmms, db_name, output, threads, gf, pf, hf, verbose)
  File "/tmp/global2/nyoungblut/code/dev/ll_pipelines/llg/tmp/fastAAI/fastaai_env/lib/python3.7/site-packages/fastaai/FastAAI.py", line 1622, in build_db
    success = add_inputs(output, final_database, existing_genome_IDs, threads, verbose, prep_args)
  File "/tmp/global2/nyoungblut/code/dev/ll_pipelines/llg/tmp/fastAAI/fastaai_env/lib/python3.7/site-packages/fastaai/FastAAI.py", line 1305, in add_inputs
    inputs = advance_inputs(genomes = genomes, proteins = proteins, hmms = hmms, genomes_file = gf, proteins_file = pf, hmms_file = hf, output = output_path, threads = threads, verbose = verbose, db_name = db_name)
  File "/tmp/global2/nyoungblut/code/dev/ll_pipelines/llg/tmp/fastAAI/fastaai_env/lib/python3.7/site-packages/fastaai/FastAAI.py", line 1102, in advance_inputs
    for res in pool.imap(do_advance, inputs):
  File "/tmp/global2/nyoungblut/code/dev/ll_pipelines/llg/tmp/fastAAI/fastaai_env/lib/python3.7/multiprocessing/pool.py", line 748, in next
    raise value
FileNotFoundError: [Errno 2] No such file or directory: 'test_out/predicted_proteins/TAS067.temp'

Error 3

FastAAI also cannot create a full directory path:

FastAAI tried to make output directory: 'tmp/test_out' but failed.

Using os.makedirs would fix this.

KGerhardt commented 2 years ago

I just opened FastAAI for issues and I'm migrating this thread over there.

https://github.com/KGerhardt/FastAAI/issues/1

KGerhardt commented 2 years ago

I don't believe that the FastAAI issues thread will be looping you in automatically, so I'm notifying you here. I have a reply up on that issues thread.

nick-youngblut commented 2 years ago

I don't believe that the FastAAI issues thread will be looping you in automatically, so I'm notifying you here. I have a reply up on that issues thread.

If you use @nick-youngblut, then I should get a notification