formbio / FLAG

Apache License 2.0
28 stars 4 forks source link

question about parameters #1

Closed dirkjanvw closed 4 months ago

dirkjanvw commented 1 year ago

Hi! I arrived here via the preprint on bioRxiv and I think your pipeline looks very promising! I wanted to try it out with some data of mine, but I cannot find an explanation of the parameters for the pipeline? Would it be possible for you to add a section to e.g. your README explaining what each of the parameters do? Maybe it's also possible to provide the parameters and download locations of the files you used for one of the examples in your preprint? That way I (and future others) will know how to try out the pipeline :)

GRGong commented 1 year ago

same question

wtroy2 commented 1 year ago

Yes this will be added over the next few days. I will most likely push some of this tomorrow.

wtroy2 commented 1 year ago

Added some more to the docs on what different params are and recommendations just now. As well as example files for Erynnis tages.

It currently still needs the EnTap database though to run the functional annotation at the end. I'll either give a public link to a prebuilt EnTap database or instructions on how to build the same one used in the paper. It's too large to upload into the repo itself.

wtroy2 commented 1 year ago

@GRGong @dirkjanvw If you guys have any feedback on parameters or more details you'd want added to the Readme feedback would be welcome while the issue is open

dirkjanvw commented 1 year ago

Everything looks clear to me except for the EnTap database. Do you mean the "EnTAP Binary Database" from https://entap.readthedocs.io/en/latest/Getting_Started/configuration.html#running-configuration?

I will try the example command in the coming weeks probably.

wtroy2 commented 1 year ago

@dirkjanvw its that plus in the section above in the link you sent where it formats the refseq database

wtroy2 commented 1 year ago

Instructions for how to build the entap database were added along with extra help. Really it's just formatting 1 file now and then sticking them into a folder. Instructions can be found in the readme. Hopefully this is easier now

dirkjanvw commented 1 year ago

Coming back to this now after some holidays; thanks a lot for all the efforts! However, since I'm unable to use docker on our server I think I am unable to install all dependencies needed (with the test data set all 7 tasks fail because the respective tools are not installed). Do you think it is possible to install all using a single conda YAML file? Or should I wait for the singularity image?

Also, this is the list of commands I used to get the entap database (I had to modify yours a bit to get them to work) using the a singularity (v0.10.7) image from entap, putting my commands here for future others:

wget http://eggnog5.embl.de/download/eggnog_4.1/eggnog-mapper-data/eggnog4.clustered_proteins.fa.gz
wget http://eggnog6.embl.de/download/emapperdb-5.0.2/eggnog.db.gz
wget https://treegenesdb.org/FTP/EnTAP/latest/databases/entap_database.bin.gz
wget https://treegenesdb.org/FTP/EnTAP/latest/databases/entap_database.db.gz
cp FLAG/databases/uniprot_sprot.dmnd.gz .
gunzip *.gz
${entap_sif} cat /EnTAP/entap_config.ini | sed 's/=\/bin\//=/g' | sed 's/=\/databases\//=/g' | sed 's/eggnog_proteins/uniprot_sprot/g' | sed 's|/libs/[^/]*/[^/]*|/usr/local/bin|g' > entap_config.ini
${entap_sif} EnTAP --config -d eggnog4.clustered_proteins.fa --out-dir makedbs -t ${cores} --ini entap_config.ini
mv makedbs/bin/eggnog4.dmnd eggnog_proteins.dmnd
mkdir entapDBs
mv uniprot_sprot.dmnd entapDBs/
mv eggnog_proteins.dmnd entapDBs/
mv eggnog.db entapDBs/
mv entap_database.bin entapDBs/
mv entap_database.db entapDBs/
tar czf entapDBs.tar.gz entapDBs/
rm -rd entapDBs/ makedbs/
rm entap_config.ini eggnog4.clustered_proteins.fa

Finally, I noticed that the provided example command only works when I add the following lines to the FLAG/main.nf file on line 11:

params.fafile = params.fafile ?: "default_value"
params.gtffile = params.gtffile ?: "default_value"
params.blastdb = params.blastdb ?: "default_value"
params.rnaDB = params.rnaDB ?: "default_value"

This is the exact command I used for the provided test data:

wget https://ftp.ensembl.org/pub/rapid-release/species/Erynnis_tages/GCA_905147235.1/braker/genome/Erynnis_tages-GCA_905147235.1-softmasked.fa.gz
cp FLAG/examples/curatedButterflyRNA.fa.gz .
cp FLAG/examples/curatedButterflyProteins.fa.gz .
gunzip Erynnis_tages-GCA_905147235.1-softmasked.fa.gz curatedButterflyRNA.fa.gz curatedButterflyProteins.fa.gz
outdir=test_outputdir
[ -d "${outdir}" ] && rm -rd ${outdir}
mkdir -p ${outdir}
touch ${outdir}/emptyPlaceHolder.txt
nextflow run FLAG \
    --repoDir $PWD/FLAG/ \
    --entapdb entapDBs.tar.gz \
    --output ${outdir}/ \
    --genome Erynnis_tages-GCA_905147235.1-softmasked.fa \
    --rna curatedButterflyRNA.fa \
    --proteins curatedButterflyProteins.fa \
    --masker skip \
    --transcriptIn true \
    --lineage lepidoptera_odb10 \
    --annotationalgo Helixer,helixer_trained_augustus \
    --helixerModel invertebrate \
    --externalalgo input_transcript,input_proteins,transcript_from_database \
    --size small \
    --proteinalgo miniprot \
    --rnadatabaseid refseq_select_rna \
    --speciesScientificName Eynnis_tages
wtroy2 commented 1 year ago

Hmm its interesting you had to add these lines:

params.fafile = params.fafile ?: "default_value"
params.gtffile = params.gtffile ?: "default_value"
params.blastdb = params.blastdb ?: "default_value"
params.rnaDB = params.rnaDB ?: "default_value"

Those shouldn't be needed but maybe Im missing something.

Next week Im hopeful I'll have some time to test it with singularity and hopefully that will fix your issues. I don't use singularity much but I don't think adding support for it will be that difficult, fingers crossed though.

dirkjanvw commented 1 year ago

So the reason I added those four lines was because without nextflow immediately stops with this error:

$ nextflow run FLAG --repoDir /lustre/BIF/nobackup/worku005/test_flag/FLAG/ --entapdb entapDBs.tar.gz --output test_outputdir/ --genome Erynnis_tages-GCA_905147235.1-softmasked.fa --rna curatedButterflyRNA.fa --proteins curatedButterflyProteins.fa --masker skip --transcriptIn true --lineage lepidoptera_odb10 --annotationalgo Helixer,helixer_trained_augustus --helixerModel invertebrate --externalalgo input_transcript,input_proteins,transcript_from_database --size small --proteinalgo miniprot --rnadatabaseid refseq_select_rna --speciesScientificName Eynnis_tages
N E X T F L O W  ~  version 23.04.1
Launching `FLAG/main.nf` [peaceful_khorana] DSL2 - revision: 3b7e3da865
WARN: Access to undefined parameter `fafile` -- Initialise it to a default value eg. `params.fafile = some_value`
WARN: Access to undefined parameter `gtffile` -- Initialise it to a default value eg. `params.gtffile = some_value`
WARN: Access to undefined parameter `blastdb` -- Initialise it to a default value eg. `params.blastdb = some_value`
WARN: Access to undefined parameter `rnaDB` -- Initialise it to a default value eg. `params.rnaDB = some_value`
 Test - N F   P I P E L I N E
 ===================================
 outdir               : test_outputdir/
 masker               : skip
 genome               : Erynnis_tages-GCA_905147235.1-softmasked.fa
 proteins             : curatedButterflyProteins.fa
 rna                  : curatedButterflyRNA.fa
 reference_genome     : null
 reference_annotation : null
 protein database     : null
 rna database         : null
 transcriptIn         : true
 Busco Lineage        : lepidoptera_odb10
 entapDB              : entapDBs.tar.gz
 Augustus Pretrained Species   : human
 Helixer Model                 : invertebrate
 Helixer Models Available      : verterbrate, invertebrate, land_plant, fungi
 Genome Size                   : small
 Species Scientific Name       : Eynnis_tages

 all annotation algos options  : Helixer, Liftoff, denovo_augustus, related_species_augustus, augustus_pretrained, liftoff_trained_augustus, helixer_trained_augustus, transdecoder
 chosen annotation algos       : Helixer,helixer_trained_augustus

 all external algos options    : input_transcript, input_proteins, transcript_from_database, proteins_from_database
 chosen external algos         : input_transcript,input_proteins,transcript_from_database

 all protein algos             : exonerate, genomethreader, prosplign, miniprot
 chosen protein algos          : miniprot

Missing `fromPath` parameter

And when I include those four lines it starts executing all kinds of processes. Maybe this helps?

wtroy2 commented 1 year ago

For singularity can you guys use GPUs or is it preferred without? Helixer prefers to use gpus

dirkjanvw commented 1 year ago

I can make use of GPUs (and I believe more and more people have access to them on their machines)

wtroy2 commented 12 months ago

after a VERY long wait singularity support was added and gpus were removed so it is entirely cpus

spoonbender76 commented 11 months ago

Hi,

Thank you for creating this tool. I found out about it a bit late, but I'm very eager to use it. Is there a way to use GPUs for Helixer in the FLAG pipeline or let users choose the CPU/GPU version?

wtroy2 commented 11 months ago

Hi @spoonbender76 I can make a GPU option for helixer, it was GPU only before but the I switched it to CPU. So I can make a conditional for it. Do you run docker or singularity? Im more familiar with docker so can do it in that easily but singularity Im not as good with. If you need it for singularity though it can be done

spoonbender76 commented 10 months ago

Hi @wtroy2 I run docker

wtroy2 commented 10 months ago

ok awesome. I will add it sometime over the next few days. That's easy to add

wtroy2 commented 7 months ago

@dirkjanvw as we talked about at PAG a script has been added to directly pull singularity images without needing docker

dirkjanvw commented 7 months ago

Thanks a lot! Just quickly letting you know that I can actually run the example now!

image

So quick status update: The issue with Helixer I think I can solve myself (it looks at the wrong location for the models, because I did not download the Helixer model yet); but the issue with the short_summary (BUSCO output I assume?) I find more difficult to solve. It appears from the log files that the gff3 files are empty and therefore busco did not receive a proteome. I will try again later when I have fixed the Helixer model locations on my system!

wtroy2 commented 7 months ago

You shouldn't need to download the Helixer models for it to work. The models are already in the docker/singularity image. How exactly are you running it?

On Tue, Feb 13, 2024, 7:35 PM Dirk-Jan @.***> wrote:

Thanks a lot! Just quickly letting you know that I can actually run the example now! image.png (view on web) https://github.com/formbio/FLAG/assets/72025902/c56416c9-ad00-4dfd-abd1-83347271bc4d

So quick status update: The issue with Helixer I think I can solve myself (it looks at the wrong location for the models, because I did not download the Helixer model yet); but the issue with the short_summary (BUSCO output I assume?) I find more difficult to solve. It appears from the log files that the gff3 files are empty and therefore busco did not receive a proteome. I will try again later when I have fixed the Helixer model locations on my system!

— Reply to this email directly, view it on GitHub https://github.com/formbio/FLAG/issues/1#issuecomment-1942954336, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHT22X4T62PBYQV3H5PXWVDYTQIGXAVCNFSM6AAAAAA2OKBLOSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBSHE2TIMZTGY . You are receiving this because you were mentioned.Message ID: @.***>

dirkjanvw commented 7 months ago

Hmm okay, these are the steps that I ran (after downloading the input files in a directory up):

bash direct_pull_singularity_images_and_move_to_folders.sh 
bash makeDirectories.sh 
conda activate nextflow
nextflow run main.nf -w workdir/ --output outputdir/ --genome ../GCA_905147235.1_ilEryTage1.1_genomic.fna --rna ../curatedButterflyRNA.fa --proteins ../curatedButterflyProteins.fa --masker skip --transcriptIn true --lineage lepidoptera_odb10 --annotationalgo Helixer,helixer_trained_augustus --helixerModel invertebrate --externalalgo input_transcript,input_proteins --size small --proteinalgo miniprot --speciesScientificName Eynnis_tages -profile singularity

The output of the nextflow command you can find in the screenshot above :)

wtroy2 commented 7 months ago

ah ok sorry I was on my phone and didnt see the image. I will try it this way and hopefully get back to you tomorrow on it

dirkjanvw commented 7 months ago

I looked into the exact error message that Helixer gave and it says in the .command.err:

cp: cannot create regular file '/data/workflow_helixer_config.yaml': Permission denied
2024-02-16 22:23:47.016277: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-16 22:23:52.368015: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-16 22:23:52.388361: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-02-16 22:24:07.200644: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/usr/local/lib/python3.8/dist-packages/tensorflow_addons/utils/tfa_eol_msg.py:23: UserWarning: 

TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP). 

For more information see: https://github.com/tensorflow/addons/issues/2807 

  warnings.warn(
Traceback (most recent call last):
  File "/usr/local/bin/Helixer.py", line 209, in <module>
    main()
  File "/usr/local/bin/Helixer.py", line 140, in main
    args = pp.get_args()
  File "/usr/local/lib/python3.8/dist-packages/helixer/core/scripts.py", line 66, in get_args
    self.check_args(args)
  File "/usr/local/bin/Helixer.py", line 101, in check_args
    model_filepath = self.check_for_lineage_model(args.lineage)
  File "/usr/local/bin/Helixer.py", line 88, in check_for_lineage_model
    current_model = identify_current(lineage, priorty_ms)
  File "/usr/local/lib/python3.8/dist-packages/helixer/core/data.py", line 78, in identify_current
    current_models = os.listdir(os.path.join(MODEL_PATH, lineage))
FileNotFoundError: [Errno 2] No such file or directory: '/home/worku005/.local/share/Helixer/models/land_plant'

and the .command.out says:

My updated config.yaml
{'batch_size': 32,
 'compression': 'gzip',
 'debug': False,
 'edge_threshold': 0.1,
 'fasta_path': 'genome.fa',
 'lineage': 'invertebrate',
 'min_coding_length': 100,
 'no_multiprocess': False,
 'no_overlap': False,
 'overlap_core_length': 16038,
 'overlap_offset': 10692,
 'peak_threshold': 0.8,
 'species': 'Eynnis_tages',
 'subsequence_length': 21384,
 'window_size': 100}
No config file found

So I think the error might come from this line trying to access /data, which does not exist on my system. It seems that this /data directory is more often used in the scripts, not sure if that might explain the failed run for augustus en combining too?

wtroy2 commented 7 months ago

Hey I havent forgotten about you. I am not having trouble with helixer for some reason using singularity like you are. My singularity version is 4.1.0

I am however having issues with pasa and the combineandfilter step in singularity now. Pasa is having trouble with its sql database and the combineandfilter singularity image is having issues compiling: singularity build --sandbox sandbox_directory docker-daemon://ghcr.io/formbio/flag_combinefilter:latest INFO: Starting build... INFO: Fetching OCI image... INFO: Extracting OCI image... FATAL: While performing build: packer failed to pack: while unpacking tmpfs: error unpacking rootfs: unpack entry: opt/conda/pkgs/c-ares-1.19.0-h5eee18b_0/lib/libcares.so.2.6.0: link: no such file or directory

So trying to figure that out. Im not the best with singularity so its a lot of weirdness. Im not having any of these issues with docker runs though and the singularity images are built directly from the docker ones. If you have any ideas lmk but trying to get the singularity stuff fixed

dirkjanvw commented 7 months ago

Hmm so the easiest way to get these singularity images I always found to be singularity pull docker://<normal docker location>. For me those provided by you in the direct_pull_singularity_images_and_move_to_folders.sh script worked perfectly!

As for Helixer, can you check if your pipeline does something with files on your /data directory? From the script it seems like that directory has to exist for the Helixer script to run. Specifically the timestamp on the file /data/workflow_helixer_config.yaml would be interesting to know (whether that is during your pipeline run and thus caused by FLAG).

wtroy2 commented 6 months ago

OK I retested a bunch with singularity and it seems to be working for me now. I updated the combinefilter folder so that it builds from a singularity.def file now which was part of my problem at least.

For functional annotation Eggnog is now the default instead of EnTAP due to eggnog having a more up to date database, entap uses an old version of eggnog that has a semi out of data database. The Eggnog database is also easier to build and can be built in a single script as opposed to a bunch of steps.

As for the helixer problem that file is in the singularity/docker image so it should be able to find it. It's built into the image unless its being overwritten somehow in your runs.

Screenshot 2024-03-25 at 9 28 05 PM
dirkjanvw commented 6 months ago

Thanks for the work on it! However, it still doesn't run succesfully. Here is a screenshot after hitting Ctrl-C because four jobs failed:

image

Helixer still fails with me because it cannot find anything in /data:

$ singularity exec containers/helixer/flag_helixer.image ls -la /data
total 0
drwxr-xr-x 2 root     root           3 Jun 21  2023 .
drwxr-xr-x 1 worku005 domain users 120 Apr  2 16:19 ..

Can you show what this looks like for you?

wtroy2 commented 5 months ago

@dirkjanvw it seems like others are also having problem with singularity but each person's problem is different. So Im doing what we talked about at PAG and instead just going to containerize the entire thing. Singularity in Singularity. Starting to test it and it appears to be working so if all goes well Ill push it here in the next few days and have the entire workflow containerized in a single container

wtroy2 commented 5 months ago

Hey @dirkjanvw I updated it to run completely from a single singularity image. In testing it worked for me with multiple configurations. Let me know if it works for you as well whenever you get a change. The run command is updated so it's just a singularity run ....

Screenshot 2024-04-11 at 10 38 42 AM

dirkjanvw commented 5 months ago

Cool, thanks! Looks good!

I tried building the Singularity image myself but it turns out my local machine where I have sudo rights to build a singularity image doesn't have enought disk space (I need more than 13GB?). Is it possible to share your flag3.image file here?

wtroy2 commented 5 months ago

It is currently quite large at 82GB. Unsure on the best way to share that on GitHub

wtroy2 commented 5 months ago

currently working on a build and run for non-root users. May be better to hold off on further testing until that's finished but the overall image will be somewhere around 80GB

dirkjanvw commented 5 months ago

a final image of 80GB shouldn't be an issue, just building that is; what might be a solution is if you can containerise the entire workflow as docker image and host that? then we can pull that docker image as singularity without sudo rights.

wtroy2 commented 5 months ago

I tried that. It uses a nutty amount of RAM to convert it to singularity using singularity pull

wtroy2 commented 5 months ago

Currently trying to make it smaller since there's a lot of redundancy. Will keep updated

wtroy2 commented 5 months ago

@dirkjanvw ok its updated on the main branch. It's still kind of large, Im testing a smaller version but even the smaller version will be quite large. Luckily though it should be fine to build... I'm hoping at least.

Running bash build_singularity_flag.sh in the home directory of the repo should setup everything you need to run it in the examples directory. From there download the genome file https://ftp.ensembl.org/pub/rapid-release/species/Erynnis_tages/GCA_905147235.1/braker/genome/Erynnis_tages-GCA_905147235.1-softmasked.fa.gz in the examples directory and ungzip the example files and fingers crossed it will work for you but please let me know if it doesn't.

dirkjanvw commented 5 months ago

Unfortunately I didn't get any further with this update :(

On my local machine I still run out of space (as expected?), and our servers apparently do not allow the use of --fakeroot; not sure why this is. I get this error:

+ singularity build --fakeroot --fix-perms singularity_flag.image singularity_flag.def
FATAL:   could not use fakeroot: no mapping entry found in /etc/subuid for worku005
dirkjanvw commented 5 months ago

Maybe you have an FTP server where you can host your singularity image for download? Or maybe checkout services like figshare that host public data (and also allow for adding a stable DOI to it)

wtroy2 commented 5 months ago

@dirkjanvw trying to get it to work without all the extra stuff it seems like this is working for someone else so far: https://github.com/formbio/FLAG/issues/3#issuecomment-2080511736

dirkjanvw commented 5 months ago

Thanks for getting back to it! I followed the instructions in that branch and now both PASA and splign do work:

image

However, Helixer still seems to be a problem (probably the others failing after that are a result of Helixer failing?). This is the error Helixer gives:

$ cat .command.log 
My updated config.yaml
{'batch_size': 32,
 'compression': 'gzip',
 'debug': False,
 'edge_threshold': 0.1,
 'fasta_path': 'genome.fa',
 'lineage': 'invertebrate',
 'min_coding_length': 100,
 'no_multiprocess': False,
 'no_overlap': False,
 'overlap_core_length': 16038,
 'overlap_offset': 10692,
 'peak_threshold': 0.8,
 'species': 'Eynnis_tages',
 'subsequence_length': 21384,
 'window_size': 100}
cp: cannot create regular file '/data/workflow_helixer_config.yaml': Permission denied
2024-05-07 11:14:46.139139: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-05-07 11:14:46.733032: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-05-07 11:14:46.735441: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-07 11:14:48.282700: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/usr/local/lib/python3.8/dist-packages/tensorflow_addons/utils/tfa_eol_msg.py:23: UserWarning: 

TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP). 

For more information see: https://github.com/tensorflow/addons/issues/2807 

  warnings.warn(
No config file found

Traceback (most recent call last):
  File "/usr/local/bin/Helixer.py", line 209, in <module>
    main()
  File "/usr/local/bin/Helixer.py", line 140, in main
    args = pp.get_args()
  File "/usr/local/lib/python3.8/dist-packages/helixer/core/scripts.py", line 66, in get_args
    self.check_args(args)
  File "/usr/local/bin/Helixer.py", line 101, in check_args
    model_filepath = self.check_for_lineage_model(args.lineage)
  File "/usr/local/bin/Helixer.py", line 88, in check_for_lineage_model
    current_model = identify_current(lineage, priorty_ms)
  File "/usr/local/lib/python3.8/dist-packages/helixer/core/data.py", line 78, in identify_current
    current_models = os.listdir(os.path.join(MODEL_PATH, lineage))
PermissionError: [Errno 13] Permission denied: '/root/.local/share/Helixer/models/land_plant'

I went into the Helixer singularity image and I found out that the /root directory has root permissions only; but I am not allowed to run singularity images using root permissions unfortunately. Maybe the solution is to move the Helixer models somewhere else where no root permissions are required?

wtroy2 commented 5 months ago

They are getting it past the Helixer problem and not having this issue. What singularity version do you have installed? Are you able to install a later version with conda to give you updated permissions? I had someone do it like so and it worked for them:

Screenshot 2024-05-07 at 10 11 12 AM
dirkjanvw commented 4 months ago

Ooh thank you! I was relying on the singularity installation by my IT department (they installed singularity-ce version 3.9.0-rc.3), which clearly was the issue. Following your instructions for apptainer indeed makes everything work it seems! When I ran it last night it did encounter two errors but those are probably related to my restarting the pipeline with old files in the $SINGULARITY_TMPDIR; I cleaned up everything and started the pipeline from fresh. I'll let you know when it finishes!

wtroy2 commented 4 months ago

Awesome!

Also is anyone from your group going to mempangene24 in Memphis, TN?

On Wed, May 8, 2024, 3:30 AM Dirk-Jan @.***> wrote:

Ooh thank you! I was relying on the singularity installation by my IT department (they installed singularity-ce version 3.9.0-rc.3), which clearly was the issue. Following your instructions for apptainer indeed makes everything work it seems! When I ran it last night it did encounter two errors but those are probably related to my restarting the pipeline with old files in the $SINGULARITY_TMPDIR; I cleaned up everything and started the pipeline from fresh. I'll let you know when it finishes!

— Reply to this email directly, view it on GitHub https://github.com/formbio/FLAG/issues/1#issuecomment-2100042494, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHT22X273UMJTJFN4Q2TTW3ZBHPBRAVCNFSM6AAAAAA2OKBLOSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBQGA2DENBZGQ . You are receiving this because you were mentioned.Message ID: @.***>

dirkjanvw commented 4 months ago

I haven't heard anyone going there unfortunately. I know some will go to PAG 2025, though.

Also, I got this far now with a clean run:

image

Still something is going on that is not quite there yet, but it might be me not having set up the eggnog database correctly. As our servers will all be rebooted over the coming days, I hope to try again next week :) But looking good so far!

wtroy2 commented 4 months ago

Ah ok.

And ya Id think it has to do with the database. That's the most common problem on that step but the script should have set it up fine I'd think. Glad we're at least past the structural stuff though!

On Wed, May 8, 2024 at 4:08 PM Dirk-Jan @.***> wrote:

I haven't heard anyone going there unfortunately. I know some will go to PAG 2025, though.

Also, I got this far now with a clean run: image.png (view on web) https://github.com/formbio/FLAG/assets/72025902/a6f3d5dd-4dab-4d07-ab23-04b8595d5333

Still something is going on that is not quite there yet, but it might be me not having set up the eggnog database correctly. As our servers will all be rebooted over the coming days, I hope to try again next week :) But looking good so far!

— Reply to this email directly, view it on GitHub https://github.com/formbio/FLAG/issues/1#issuecomment-2101473035, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHT22X2D77AXK5GTZPRUXCTZBKH4ZAVCNFSM6AAAAAA2OKBLOSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBRGQ3TGMBTGU . You are receiving this because you were mentioned.Message ID: @.***>

dirkjanvw commented 4 months ago

I did a new install of the pipeline but I still have an issue with the functionalAnnotation step:

image

These are the steps I took:

# Clone FLAG
git clone https://github.com/formbio/FLAG.git
cd FLAG
git checkout singularity_tests #commit 93d69d6591a47535fba04899a0e2b4982a5ab94a

# Prepare data
cd examples
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/905/147/235/GCA_905147235.1_ilEryTage1.1/GCA_905147235.1_ilEryTage1.1_genomic.fna.gz
gunzip *.gz
cd ..

# Setup pipeline
bash direct_pull_singularity.sh
bash setup_eggnogDB.sh
bash makeDirectories.sh

# Run pipeline
export SINGULARITY_TMPDIR="/dev/shm/worku005/test_flag"
mkdir -p $SINGULARITY_TMPDIR
nextflow run main.nf -w workdir/ --output outputdir/ --genome examples/GCA_905147235.1_ilEryTage1.1_genomic.fna --rna examples/curatedButterflyRNA.fa --proteins examples/curatedButterflyProteins.fa --fafile examples/GCF_009731565.1_Dplex_v4_genomic.fa --gtffile examples/GCF_009731565.1_Dplex_v4_genomic.gff --masker skip --transcriptIn true --lineage lepidoptera_odb10 --annotationalgo Liftoff,Helixer,helixer_trained_augustus --helixerModel invertebrate --externalalgo input_transcript,input_proteins --size small --proteinalgo miniprot --speciesScientificName Eynnis_tages --funcAnnotProgram eggnog --eggnogDB eggnogDB.tar.gz -profile singularity

And this is why functionalAnnotation failed:

$ tail workdir/01/2d3c6a9b8c30d69c475df38d33661b/.command.log 
Fasta file parsed
usage: /usr/local/bin/agat_sp_extract_sequences.pl --clean_final_stop --gff FinalStructuralAnnotationLenientFilter.gtf -f genome.fa -p -o protein.fa
12551 cds converted in fasta.
Job done in 87 seconds
#  emapper-2.1.12-93d69d6
# emapper.py  -i protein.fa -o eggnog --evalue 0.05 --cpu 128
  /usr/local/bin/diamond blastp -d '/dbs/eggnog_proteins.dmnd' -q '/lustre/BIF/nobackup/worku005/test_flag/FLAG/workdir/01/2d3c6a9b8c30d69c475df38d33661b/protein.fa' --threads 128 -o '/lustre/BIF/nobackup/worku005/test_flag/FLAG/workdir/01/2d3c6a9b8c30d69c475df38d33661b/eggnog.emapper.hits' --tmpdir '/lustre/BIF/nobackup/worku005/test_flag/FLAG/workdir/01/2d3c6a9b8c30d69c475df38d33661b/emappertmp_dmdn_q68yzwt2' --sensitive --iterate -e 0.05 --top 3  --outfmt 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qcovhsp scovhsp
Error running diamond: Opening the database... Error: Error detecting input file format. First line seems to be blank.
cp: cannot stat 'eggnog.emapper.annotations': No such file or directory
INFO:    Cleaning up image...

Do you know what went wrong here?

wtroy2 commented 4 months ago

I see the issue now. This makes sense. I will have it fixed for you in the next 24 hours.

its the /dbs path

wtroy2 commented 4 months ago

ok you should now be able to run it by re pulling the image for the ghcr.io/formbio/flag_entap:latest image. So make sure you delete the old one:

cd containers/entap/
rm flag_entap.image
singularity pull flag_entap.image docker://ghcr.io/formbio/flag_entap:latest

Then you should be able to just resume your run from the failed step instead of restarting the whole thing by just doing: export SINGULARITY_TMPDIR="/dev/shm/worku005/test_flag" nextflow run main.nf -w workdir/ --output outputdir/ --genome examples/GCA_905147235.1_ilEryTage1.1_genomic.fna --rna examples/curatedButterflyRNA.fa --proteins examples/curatedButterflyProteins.fa --fafile examples/GCF_009731565.1_Dplex_v4_genomic.fa --gtffile examples/GCF_009731565.1_Dplex_v4_genomic.gff --masker skip --transcriptIn true --lineage lepidoptera_odb10 --annotationalgo Liftoff,Helixer,helixer_trained_augustus --helixerModel invertebrate --externalalgo input_transcript,input_proteins --size small --proteinalgo miniprot --speciesScientificName Eynnis_tages --funcAnnotProgram eggnog --eggnogDB eggnogDB.tar.gz -profile singularity -resume

dirkjanvw commented 4 months ago

Fantastic! It has now successfully completed!

image

Thanks a lot! I'll start running it on some of my own assemblies now, but I'll let you know in another issue if I encounter any other problems :)

wtroy2 commented 4 months ago

Awesome! That's so great to hear! Very happy it's finally working for others with singularity!

Please feel free to reach out if you have any issues. Im hoping that we have the updated preprint out soon and when we do I'll update the repo. Got some cool new filtering and QC stuff that's going to be added but getting it running for people with singularity was the priority.