formbio / FLAG

Apache License 2.0
21 stars 3 forks source link

questions about some commands #17

Open spoonbender76 opened 2 months ago

spoonbender76 commented 2 months ago

I am trying the new big singularity image(81.5 G) and from the document I have questions about some commands.

  1. --overlay $(pwd)/tempdir will trigger

FATAL:container creation failed:while setting overlay session layout: only root user can use sandbox overlay in setuid mode

still require root for singularity users. Is this necessary?

  1. Even with -w miniprot It shows

chosen protein algos: None

And I don't know if this error can be ignored or it has any affects.

WARN: Unknown directive runOptions for process pasa [b7/9cbcde] NOTE: Process pasa(1) terminated with an error exit status(2) - - Error is ignored

image

  1. Just to clarity in singularity run you don't need to put liftoff in -z if Liftoff is desired? In document it says: If Liftoff is desired the above command can be modified such as below:
singularity run --bind $(pwd):/data --bind $(pwd)/tempdir:/tmp \
--overlay $(pwd)/tempdir  singularity_flag.image \
-g Erynnis_tages-GCA_905147235.1-softmasked.fa -r curatedButterflyRNA.fa \
-p curatedButterflyProteins.fa -f GCF_009731565.1_Dplex_v4_genomic.fa \
-a GCF_009731565.1_Dplex_v4_genomic.gff -m skip -t true \
-l lepidoptera_odb10 \
-z Helixer,helixer_trained_augustus -q vertebrate -s small -n Eynnis_tages \
-w miniprot -y normal -p singularity -o outputdir -u singularity

image In chosen annotation algo the Liftoff is absent.

  1. Besides, how to update Helixer to latest version v0.3.3
wtroy2 commented 2 months ago

Let me try with a separate user because I am normally a sudo user on the system so didn't realize this.

You may be able to do --writable-tmpfs instead of overlay. I will check in the morning though to be sure.

On Thu, Apr 11, 2024, 9:21 PM spoonbender76 @.***> wrote:

I am trying the new big singularity image and from the document

  1. --overlay $(pwd)/tempdir will trigger

FATAL:container creation failed:while setting overlay session layout: only root user can use sandbox overlay in setuid mode

still require root for singularity users

  1. Even with -w miniprot It shows

chosen protein algos: None

And I don't know this error can be ignored or have any affects.

WARN: Unknown directive runOptions for process pasa [b7/9cbcde] NOTE: Process pasa(1) terminated with an error exit status(2)

    • Error is ignored

image.png (view on web) https://github.com/formbio/FLAG/assets/109210499/0062f413-7ee7-443e-996b-4565662c3cde

If Liftoff is desired the above command can be modified such as below:

singularity run --bind $(pwd):/data --bind $(pwd)/tempdir:/tmp \ --overlay $(pwd)/tempdir singularity_flag.image \ -g Erynnis_tages-GCA_905147235.1-softmasked.fa -r curatedButterflyRNA.fa \ -p curatedButterflyProteins.fa -f GCF_009731565.1_Dplex_v4_genomic.fa \ -a GCF_009731565.1_Dplex_v4_genomic.gff -m skip -t true \ -l lepidoptera_odb10 \ -z Helixer,helixer_trained_augustus -q vertebrate -s small -n Eynnis_tages \ -w miniprot -y normal -p singularity -o outputdir -u singularity

— Reply to this email directly, view it on GitHub https://github.com/formbio/FLAG/issues/17, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHT22XZEIIEMGMNBYMBSVRTY45ALPAVCNFSM6AAAAABGDKRE3WVHI2DSMVQWIX3LMV43ASLTON2WKOZSGIZTQOJRGM3TINQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

wtroy2 commented 2 months ago

Pasa is failing because it's trying to write a sqlite file is my guess related to the overlay. The warning you have for it is fine.

Even if it does fail you should get good results since Splign will also do transcript to genome alignments but it's slightly better with Pasa working.

On Thu, Apr 11, 2024, 10:44 PM William Troy @.***> wrote:

Let me try with a separate user because I am normally a sudo user on the system so didn't realize this.

You may be able to do --writable-tmpfs instead of overlay. I will check in the morning though to be sure.

On Thu, Apr 11, 2024, 9:21 PM spoonbender76 @.***> wrote:

I am trying the new big singularity image and from the document

  1. --overlay $(pwd)/tempdir will trigger

FATAL:container creation failed:while setting overlay session layout: only root user can use sandbox overlay in setuid mode

still require root for singularity users

  1. Even with -w miniprot It shows

chosen protein algos: None

And I don't know this error can be ignored or have any affects.

WARN: Unknown directive runOptions for process pasa [b7/9cbcde] NOTE: Process pasa(1) terminated with an error exit status(2) - - Error is ignored

image.png (view on web) https://github.com/formbio/FLAG/assets/109210499/0062f413-7ee7-443e-996b-4565662c3cde

If Liftoff is desired the above command can be modified such as below:

singularity run --bind $(pwd):/data --bind $(pwd)/tempdir:/tmp \ --overlay $(pwd)/tempdir singularity_flag.image \ -g Erynnis_tages-GCA_905147235.1-softmasked.fa -r curatedButterflyRNA.fa \ -p curatedButterflyProteins.fa -f GCF_009731565.1_Dplex_v4_genomic.fa \ -a GCF_009731565.1_Dplex_v4_genomic.gff -m skip -t true \ -l lepidoptera_odb10 \ -z Helixer,helixer_trained_augustus -q vertebrate -s small -n Eynnis_tages \ -w miniprot -y normal -p singularity -o outputdir -u singularity

— Reply to this email directly, view it on GitHub https://github.com/formbio/FLAG/issues/17, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHT22XZEIIEMGMNBYMBSVRTY45ALPAVCNFSM6AAAAABGDKRE3WVHI2DSMVQWIX3LMV43ASLTON2WKOZSGIZTQOJRGM3TINQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

wtroy2 commented 2 months ago

the protein algo has been fixed as well as instructions for singularity with liftoff. That was a typo thank you for catching those!

spoonbender76 commented 2 months ago

I setup Docker according to instructions and I've been running Docker example for over 15 hours, but I don't see any signs of it running. There are no splign/augustus processes visible. I have to CTRL-C since it appears to be stalled.

nextflow run main.nf -w workdir/ --output outputdir/ \
    --genome examples/Erynnis_tages-GCA_905147235.1-softmasked.fa --rna examples/curatedButterflyRNA.fa \
    --proteins examples/curatedButterflyProteins.fa --fafile examples/GCF_009731565.1_Dplex_v4_genomic.fa \
    --gtffile examples/GCF_009731565.1_Dplex_v4_genomic.gff --masker skip --transcriptIn true \
    --lineage lepidoptera_odb10 --annotationalgo Liftoff,Helixer,helixer_trained_augustus \
    --helixerModel invertebrate --externalalgo input_transcript,input_proteins --size small --proteinalgo miniprot \
    --speciesScientificName Eynnis_tages \
    --funcAnnotProgram eggnog --eggnogDB eggnogDB.tar.gz -profile docker

image

By the way, the flags --fafile and --gtffile appear to be specified twice in Example Docker Run commands

If Liftoff is desired the above command can be modified such as below:

nextflow run main.nf -w workdir/ --output outputdir/ \ --genome examples/Erynnis_tages-GCA905147235.1-softmasked.fa --rna examples/curatedButterflyRNA.fa \ --proteins examples/curatedButterflyProteins.fa **--fafile examples/GCF_009731565.1_Dplex_v4genomic.fa \ _--gtffile examples/GCF_009731565.1_Dplex_v4genomic.gff --masker skip --transcriptIn true \ --lineage lepidoptera_odb10 --annotationalgo Liftoff,Helixer,helixer_trained_augustus \ --helixerModel invertebrate --externalalgo input_transcript,input_proteins --size small --proteinalgo miniprot \ --speciesScientificName Eynnis_tages --fafile examples/monarchGenome.fa --gtffile examples/monarchAnnotation.gff3** \ --funcAnnotProgram eggnog --eggnogDB eggnogDB.tar.gz -profile docker

nextflow run main.nf -w workdir/ --output outputdir/ \ --genome examples/Erynnis_tages-GCA905147235.1-softmasked.fa --rna examples/curatedButterflyRNA.fa \ --proteins examples/curatedButterflyProteins.fa **--fafile examples/GCF_009731565.1_Dplex_v4genomic.fa \ _--gtffile examples/GCF_009731565.1_Dplex_v4genomic.gff --masker skip --transcriptIn true \ --lineage lepidoptera_odb10 --annotationalgo Liftoff,Helixer,helixer_trained_augustus \ --helixerModel invertebrate --externalalgo input_transcript,input_proteins --size small \ --proteinalgo miniprot --speciesScientificName Eynnis_tages --fafile examples/monarchGenome.fa \ --gtffile examples/monarchAnnotation.gff3** --runMode laptop --funcAnnotProgram eggnog \ --eggnogDB eggnogDB.tar.gz -profile docker_small

wtroy2 commented 2 months ago

This one is interesting. It looks like Splign is stalled out. This is not usually a process that runs into issues so very unsure why it's stalled. If you have the process logs for that one feel free to send it.

Augustus and all of the rest of the processes are waiting for Splign to finish before they run as the Splign outputs go into the next processes.

And thank you for noticing this I will fix it ASAP. Currently working on making the singularity all run from one large container.

spoonbender76 commented 2 months ago

Thank you for the quick reply! I ran the docker liftoff example again, but nextflow keeps printing lines to .nextflow.log even though no splign processes are running. I also checked the workdir and found splign.gff3 is empty.

Apr-17 15:22:17.159 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor Local > tasks to be completed: 1 -- submitted tasks are shown below
~> TaskHandler[id: 7; name: splign (1); status: RUNNING; exit: -; error: -; workDir: /home/cnrri01/ssd/FLAG/workdir/b6/4e8fca8b11ebc069bc1491ba514ae6]
tree -s -D -h /home/cnrri01/ssd/FLAG/workdir/b6/4e8fca8b11ebc069bc1491ba514ae6/
[4.0K Apr 17 11:15]  /home/cnrri01/ssd/FLAG/workdir/b6/4e8fca8b11ebc069bc1491ba514ae6/
|-- [4.0K Apr 17 11:21]  1_folder
|   |-- [4.0K Apr 17 11:15]  _SplignLDS2_
|   |   `-- [ 17M Apr 17 11:15]  splign.lds2db
|   |-- [452K Apr 17 11:19]  cdna.compartments
|   |-- [127M Apr 17 11:13]  cdna.fa
|   |-- [2.7M Apr 17 11:16]  cdna.fa.ndb
|   |-- [7.7M Apr 17 11:16]  cdna.fa.nhr
|   |-- [586K Apr 17 11:16]  cdna.fa.nin
|   |-- [ 497 Apr 17 11:16]  cdna.fa.njs
|   |-- [195K Apr 17 11:16]  cdna.fa.nog
|   |-- [1.1M Apr 17 11:16]  cdna.fa.nos
|   |-- [586K Apr 17 11:16]  cdna.fa.not
|   |-- [ 30M Apr 17 11:16]  cdna.fa.nsq
|   |-- [ 16K Apr 17 11:16]  cdna.fa.ntf
|   |-- [195K Apr 17 11:16]  cdna.fa.nto
|   |-- [319M Apr 17 11:15]  genome.fa
|   |-- [ 32K Apr 17 11:15]  genome.fa.ndb
|   |-- [4.4K Apr 17 11:15]  genome.fa.nhr
|   |-- [ 580 Apr 17 11:15]  genome.fa.nin
|   |-- [ 516 Apr 17 11:15]  genome.fa.njs
|   |-- [ 192 Apr 17 11:15]  genome.fa.nog
|   |-- [ 573 Apr 17 11:15]  genome.fa.nos
|   |-- [ 488 Apr 17 11:15]  genome.fa.not
|   |-- [ 79M Apr 17 11:15]  genome.fa.nsq
|   |-- [ 16K Apr 17 11:15]  genome.fa.ntf
|   |-- [ 164 Apr 17 11:15]  genome.fa.nto
|   |-- [ 10M Apr 17 11:21]  splign.asn
|   |-- [   0 Apr 17 11:21]  splign.gff3
|   |-- [ 78K Apr 17 11:21]  splign.log
|   `-- [965K Apr 17 11:21]  splign.out
|-- [4.0K Apr 17 11:20]  2_folder
|   |-- [4.0K Apr 17 11:15]  _SplignLDS2_
|   |   `-- [ 14M Apr 17 11:15]  splign.lds2db
|   |-- [786K Apr 17 11:19]  cdna.compartments
|   |-- [108M Apr 17 11:13]  cdna.fa
|   |-- [2.1M Apr 17 11:16]  cdna.fa.ndb
|   |-- [6.0M Apr 17 11:16]  cdna.fa.nhr
|   |-- [467K Apr 17 11:16]  cdna.fa.nin
|   |-- [ 497 Apr 17 11:16]  cdna.fa.njs
|   |-- [156K Apr 17 11:16]  cdna.fa.nog
|   |-- [895K Apr 17 11:16]  cdna.fa.nos
|   |-- [467K Apr 17 11:16]  cdna.fa.not
|   |-- [ 25M Apr 17 11:16]  cdna.fa.nsq
|   |-- [ 16K Apr 17 11:16]  cdna.fa.ntf
|   |-- [156K Apr 17 11:16]  cdna.fa.nto
|   |-- [319M Apr 17 11:15]  genome.fa
|   |-- [ 32K Apr 17 11:15]  genome.fa.ndb
|   |-- [4.4K Apr 17 11:15]  genome.fa.nhr
|   |-- [ 580 Apr 17 11:15]  genome.fa.nin
|   |-- [ 516 Apr 17 11:15]  genome.fa.njs
|   |-- [ 192 Apr 17 11:15]  genome.fa.nog
|   |-- [ 573 Apr 17 11:15]  genome.fa.nos
|   |-- [ 488 Apr 17 11:15]  genome.fa.not
|   |-- [ 79M Apr 17 11:15]  genome.fa.nsq
|   |-- [ 16K Apr 17 11:15]  genome.fa.ntf
|   |-- [ 164 Apr 17 11:15]  genome.fa.nto
|   |-- [ 13M Apr 17 11:20]  splign.asn
|   |-- [   0 Apr 17 11:20]  splign.gff3
|   |-- [239K Apr 17 11:20]  splign.log
|   `-- [1.2M Apr 17 11:20]  splign.out
|-- [319M Apr 17 11:13]  Erynnis_tages-GCA_905147235.1-softmasked.fa
|-- [234M Apr 17 11:13]  cdna.fa
|-- [234M Apr 17 11:13]  formatted_curatedButterflyRNA.fa
|-- [319M Apr 17 11:13]  genome.fa
|-- [ 32K Apr 17 11:13]  genome.fa.ndb
|-- [4.4K Apr 17 11:13]  genome.fa.nhr
|-- [ 580 Apr 17 11:13]  genome.fa.nin
|-- [ 516 Apr 17 11:13]  genome.fa.njs
|-- [ 192 Apr 17 11:13]  genome.fa.nog
|-- [ 573 Apr 17 11:13]  genome.fa.nos
|-- [ 488 Apr 17 11:13]  genome.fa.not
|-- [ 79M Apr 17 11:13]  genome.fa.nsq
|-- [ 16K Apr 17 11:13]  genome.fa.ntf
|-- [ 164 Apr 17 11:13]  genome.fa.nto
|-- [ 283 Apr 17 11:15]  parallel_001.txt
|-- [ 283 Apr 17 11:15]  parallel_002.txt
`-- [  44 Apr 17 11:15]  parallel_commands.txt

5 directories, 73 files

I've attached some log files here for reference. Please let me know if you need any other information. nextflow.log command.log command.out.txt command.err.txt splign.log splign.out.txt parallel_001.txt parallel_002.txt parallel_commands.txt

wtroy2 commented 2 months ago

I updated the ncbiclibraries container that splign runs on to hopefully fix the problem you are having.

Tested on a completely fresh Debian system that I just installed nextflow and docker on and it ran fine:

Screenshot 2024-04-23 at 6 06 40 PM

So id try repulling the containers, specifically ghcr.io/formbio/flag_ncbiclibraries:latest and rerunning and fingers crossed it works. The docker should be much more stable than singularity.

spoonbender76 commented 1 month ago

I haven't used FLAG in a while, but have you tried running it with the --annotationalgo Liftoff,Helixer,helixer_trained_augustus flag? I noticed that all the successful run screenshots seem to be without Liftoff in the --annotationalgo flag.

wtroy2 commented 1 month ago

Ya I have. I will do a run tomorrow and add a screenshot to the docs.

wtroy2 commented 1 month ago

A screenshot of it working has been added to the readme.md file on the GitHub main branch. This should also help users for reference

spoonbender76 commented 1 month ago

image Thank you for the help! I have completed a FLAG run using Apptainer, but I encountered significant delays due to the time spent downloading BUSCO lineage files, likely caused by my connection issues. Is there a way to specify a local directory for pre-downloaded BUSCO lineage files, and use the --offline option for all BUSCO commands within the FLAG pipeline?

Details: I tried to create a FLAG environment with Apptainer using the following commands:

conda create -n flag apptainer
conda activate flag
cp /etc/apptainer/apptainer.config $CONDA_PREFIX/etc/apptainer/

However, the folder /etc/apptainer/ didn't exist. So, I installed Apptainer v1.31 manually, repulled all containers, and reran the pipeline. Unfortunately, it got stuck at the CombineAndFilter step. Upon closer inspection, I found that this was primarily due to my very slow connection, which took hours to download a BUSCO file lepidoptera_odb10.tar.gz. I experienced the same issue with Docker. So I wonder if it's possible to use --offline option for all BUSCO commands within the FLAG pipeline.

wtroy2 commented 1 month ago

Thanks for the update! I'm glad it ran for you!

As for the offline mode that's actually a pretty smart idea. I always have fast connection so never experienced this issue but an offline mode is something that should be useful to others. I will put this on my todo list!

On Mon, May 20, 2024, 8:56 PM spoonbender76 @.***> wrote:

image.png (view on web) https://github.com/formbio/FLAG/assets/109210499/c2badc9b-1deb-49af-830a-c03b4073020d Thank you for the help! I have completed a FLAG run using Apptainer, but I encountered significant delays due to the time spent downloading BUSCO lineage files, likely caused by my connection issues. Is there a way to specify a local directory for pre-downloaded BUSCO lineage files, and use the --offline option for all BUSCO commands within the FLAG pipeline?

Details: I tried to create a FLAG environment with Apptainer using the following commands:

conda create -n flag apptainer conda activate flag cp /etc/apptainer/apptainer.config $CONDA_PREFIX/etc/apptainer/

However, the folder /etc/apptainer/ didn't exist. So, I installed Apptainer v1.31 manually, repulled all containers, and reran the pipeline. Unfortunately, it got stuck at the CombineAndFilter step. Upon closer inspection, I found that this was primarily due to my very slow connection, which took hours to download a BUSCO file lepidoptera_odb10.tar.gz. I experienced the same issue with Docker. So I wonder if it's possible to use --offline option for all BUSCO commands within the FLAG pipeline.

— Reply to this email directly, view it on GitHub https://github.com/formbio/FLAG/issues/17#issuecomment-2121558509, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHT22XYMHPW5MPY5OWGKNMDZDKSVFAVCNFSM6AAAAABGDKRE3WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRRGU2TQNJQHE . You are receiving this because you commented.Message ID: @.***>