Gaius-Augustus / GALBA

GALBA is a pipeline for fully automated prediction of protein coding gene structures with AUGUSTUS in novel eukaryotic genomes for the scenario where high quality proteins from one or several closely related species are available.
Other
121 stars 4 forks source link

Run GALBA with external miniprot run #8

Open CongLiu37 opened 1 year ago

CongLiu37 commented 1 year ago

Hello,

I am wondering if it is possible to run GALBA with pre-computed miniprot alignments, or make GALBA accept multiple protein files and call miniprot for these files one by one? I have limited memory and it is difficult to run miniprot with all proteins in a single file.

Sincerely,

Cong

KatharinaHoff commented 1 year ago

Aligning separately within GALBA is already possible. If you provide several protein file names comma separated, the files will be aligned one-by-one. It’s not elegant, the index is re-built every time. Other parts of the pipeline may be more RAM critical.

I will not add feeding precomputed alignments into GALBA as a command line option.

Cong Liu @.***> schrieb am Mi. 1. März 2023 um 01:49:

Hello,

I am wondering if it is possible to run GALBA with pre-computed miniprot alignments, or make GALBA accept multiple protein files and call miniprot for these files one by one? I have limited memory and it is difficult to run miniprot with all proteins in a single file.

Sincerely,

Cong

— Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/GALBA/issues/8, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JHCQ2IZHEVZKLWY6KDWZ2MIXANCNFSM6AAAAAAVLM5EB4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

KatharinaHoff commented 1 year ago

Keep in mind that GALBA today is a pipeline for exactly one reference protein set. We will continue to expand functionality for large protein input, but currently, using one specifies protein set is your safest bet.

Katharina Hoff @.***> schrieb am Mi. 1. März 2023 um 08:37:

Aligning separately within GALBA is already possible. If you provide several protein file names comma separated, the files will be aligned one-by-one. It’s not elegant, the index is re-built every time. Other parts of the pipeline may be more RAM critical.

I will not add feeding precomputed alignments into GALBA as a command line option.

Cong Liu @.***> schrieb am Mi. 1. März 2023 um 01:49:

Hello,

I am wondering if it is possible to run GALBA with pre-computed miniprot alignments, or make GALBA accept multiple protein files and call miniprot for these files one by one? I have limited memory and it is difficult to run miniprot with all proteins in a single file.

Sincerely,

Cong

— Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/GALBA/issues/8, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JHCQ2IZHEVZKLWY6KDWZ2MIXANCNFSM6AAAAAAVLM5EB4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

CongLiu37 commented 1 year ago

Hello,

I just tested multiple protein files, but it did not work.

galba.pl --genome=genome.fa --prot_seq=proteins1.fa,proteins2.fa --skipOptimize --threads 8
#**********************************************************************************
#                               GALBA CONFIGURATION                               
#**********************************************************************************
# GALBA CALL: /home/c/c-liu/Softwares/GALBA/scripts/galba.pl --genome=genome.fa --prot_seq=proteins1.fa,proteins2.fa --skipOptimize --threads 8
# Wed Mar  1 16:53:15 2023: galba.pl version 1.0.1
# Wed Mar  1 16:53:15 2023: Configuring of GALBA for using external tools...
# Wed Mar  1 16:53:15 2023: Found environment variable $AUGUSTUS_CONFIG_PATH. Setting $AUGUSTUS_CONFIG_PATH to /home/c/c-liu/Softwares/Augustus/config/
# Wed Mar  1 16:53:15 2023: Found environment variable $AUGUSTUS_BIN_PATH. Setting $AUGUSTUS_BIN_PATH to /apps/unit/BioinfoUgrp/DebianMed/11.2/modules/augustus/3.4.0+dfsg2-2/bin/
# Wed Mar  1 16:53:15 2023: Found environment variable $AUGUSTUS_SCRIPTS_PATH. Setting $AUGUSTUS_SCRIPTS_PATH to /home/c/c-liu/Softwares/Augustus/scripts/
# Wed Mar  1 16:53:15 2023: Found environment variable $PYTHON3_PATH. Setting $PYTHON3_PATH to /home/c/c-liu/miniconda3/bin/
# Wed Mar  1 16:53:15 2023: Found environment variable $DIAMOND_PATH. Setting $DIAMOND_PATH to /apps/unit/BioinfoUgrp/Other/DIAMOND/2.0.4.142/
# Wed Mar  1 16:53:15 2023: Found environment variable $MINIPROT_PATH. Setting $GMINIPROT_PATH to /home/c/c-liu/Softwares/miniprot/
# Wed Mar  1 16:53:15 2023: ERROR: in file /home/c/c-liu/Softwares/GALBA/scripts/galba.pl at line 541
GALBA does currently  not support using multiple protein input files with Miniprot as an aligner. Please combine your protein fasta files into a single file before starting GALBA.

Sincerely,

Cong

KatharinaHoff commented 1 year ago

Ah, then disabled it because it would rebuild the index. I am writing from my phone. It’s not hard to reverse this change but I am not convinced the alignment step is the memory critical step. I will measure RAM consumption this or next week.

Cong Liu @.***> schrieb am Mi. 1. März 2023 um 09:05:

Hello,

I just tested multiple protein files, but it did not work.

galba.pl --genome=genome.fa --prot_seq=proteins1.fa,proteins2.fa --skipOptimize --threads 8

**

GALBA CONFIGURATION

**

GALBA CALL: /home/c/c-liu/Softwares/GALBA/scripts/galba.pl --genome=genome.fa --prot_seq=proteins1.fa,proteins2.fa --skipOptimize --threads 8

Wed Mar 1 16:53:15 2023: galba.pl version 1.0.1

Wed Mar 1 16:53:15 2023: Configuring of GALBA for using external tools...

Wed Mar 1 16:53:15 2023: Found environment variable $AUGUSTUS_CONFIG_PATH. Setting $AUGUSTUS_CONFIG_PATH to /home/c/c-liu/Softwares/Augustus/config/

Wed Mar 1 16:53:15 2023: Found environment variable $AUGUSTUS_BIN_PATH. Setting $AUGUSTUS_BIN_PATH to /apps/unit/BioinfoUgrp/DebianMed/11.2/modules/augustus/3.4.0+dfsg2-2/bin/

Wed Mar 1 16:53:15 2023: Found environment variable $AUGUSTUS_SCRIPTS_PATH. Setting $AUGUSTUS_SCRIPTS_PATH to /home/c/c-liu/Softwares/Augustus/scripts/

Wed Mar 1 16:53:15 2023: Found environment variable $PYTHON3_PATH. Setting $PYTHON3_PATH to /home/c/c-liu/miniconda3/bin/

Wed Mar 1 16:53:15 2023: Found environment variable $DIAMOND_PATH. Setting $DIAMOND_PATH to /apps/unit/BioinfoUgrp/Other/DIAMOND/2.0.4.142/

Wed Mar 1 16:53:15 2023: Found environment variable $MINIPROT_PATH. Setting $GMINIPROT_PATH to /home/c/c-liu/Softwares/miniprot/

Wed Mar 1 16:53:15 2023: ERROR: in file /home/c/c-liu/Softwares/GALBA/scripts/galba.pl at line 541

GALBA does currently not support using multiple protein input files with Miniprot as an aligner. Please combine your protein fasta files into a single file before starting GALBA.

Sincerely,

Cong

— Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/GALBA/issues/8#issuecomment-1449522511, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JHMAKE5REG4XW4IIRDWZ37NTANCNFSM6AAAAAAVLM5EB4 . You are receiving this because you commented.Message ID: @.***>

CongLiu37 commented 1 year ago

Thank you for your feedback!

Sincerely,

Cong