MetazoaPhylogenomicsLab / FANTASIA

GNU General Public License v3.0
25 stars 2 forks source link

Installation problem #2

Open francicco opened 8 months ago

francicco commented 8 months ago

Hi,

I wanted to test your software, but I found some problems during the installation. I don't have singularity on my cluster, and I can't install it, so I'm trying to install it manually.

I followed the installation_guide_FANTASIA.sh, it seems successfully but can't see the fantasia executable. I also tried to execute launch_gopredsim.sh and I wasn't lucky either.

What am I doing wrong? any help? Cheers Francesco

gmartinezredondo commented 8 months ago

Hi, Francesco!

There is no executable created when installing it manually. That is actually a good idea to implement. I may try it when I have some time.

Regarding launch_gopredsim.sh, could you send me the error you're getting?

Best,

Gemma

francicco commented 8 months ago

Hi Gemma,

Thanks you for the quick reply. This is the script.

#!/bin/bash

while getopts f: flag
do
    case "${flag}" in
        f) filepath=${OPTARG};;
    esac
done

python /user/work/tk19812/software/FANTASIA/FANTASIA/goPredSim/predict_go_embedding_inference.py $filepath

I had to change the original one because it was pointing the FANTASIA dir instead of goPredSim where predict_go_embedding_inference.py actually is.

If I execute it: ./launch_gopredsim.sh file.fasta

I get:

Traceback (most recent call last):
  File "/user/work/tk19812/software/FANTASIA/FANTASIA/goPredSim/predict_go_embedding_inference.py", line 49, in <module>
    main()
  File "/user/work/tk19812/software/FANTASIA/FANTASIA/goPredSim/predict_go_embedding_inference.py", line 10, in main
    config_file = sys.argv[1]
IndexError: list index out of range
gmartinezredondo commented 8 months ago

Hi, Francesco!

I had to change the original one because it was pointing the FANTASIA dir instead of goPredSim where predict_go_embedding_inference.py actually is. I changed it already in the compressed file with the code, so this should not happen again.

Have you executed the generate_gopredsim_input_file.sh script?

I believe the problem is that you haven't told the script where the configuration files (generated by generate_gopredsim_input_file.sh) are. You can run launch_gopredsim.sh -h to see the required input files and how to execute it. If you told the generate_gopredsim_input_file.sh script that your configuration files should go to folder X, you must put the same when executing launch_gopredsim.sh. Otherwise, it will assume that it is in the same directory, in a folder called config_files.

francicco commented 8 months ago

Hi Gemma,

I don't quite understand how the pipeline works. As you suggested I executed ./launch_gopredsim.sh -h but I've got

./launch_gopredsim.sh: illegal option -- h
Traceback (most recent call last):
  File "/user/work/tk19812/software/FANTASIA/FANTASIA/goPredSim/predict_go_embedding_inference.py", line 49, in <module>
    main()
  File "/user/work/tk19812/software/FANTASIA/FANTASIA/goPredSim/predict_go_embedding_inference.py", line 10, in main
    config_file = sys.argv[1]
IndexError: list index out of range

What am I doing wrong? Cheers F

gmartinezredondo commented 8 months ago

Oh, I see the problem now. My bad for not seeing it before. You need to execute launch_gopredsim_pipeline.sh, not launch_gopredsim.sh.

francicco commented 8 months ago

Yeah, I also tried that one, but I have errors there as well. First of all there "module load cesga/2020" which I dont really know that it is, then I substituted the other such as cuda with mime Then on line 118 there as cd $OUT_PATH which fails because the directory wasn't created yet.

Once I fixed these small things I still get:

./launch_gopredsim_pipeline.sh --model prott5 --prefix FIrstRun -c $PWD -o FirstRun

The following have been reloaded with a version change: 1) lang/gcc/7.5.0 => lang/gcc/9.1.0 2) lang/python/anaconda/3.10.4-2021-11-fencis => lang/python/anaconda/3.9.7-2021.12-tensorflow.2.7.0

Traceback (most recent call last): File "/user/work/tk19812/software/FANTASIA/FANTASIA/venv/lib/python3.9/site-packages/bio_embeddings/utilities/pipeline.py", line 93, in _validate_file if os.stat(file_path).st_size == 0: FileNotFoundError: [Errno 2] No such file or directory: '/user/work/tk19812/software/FANTASIA/FANTASIA/config_files/embeddings/FIrstRun_prott5.yml'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/user/work/tk19812/software/FANTASIA/FANTASIA/venv/bin/bio_embeddings", line 8, in <module>
    sys.exit(main())
  File "/user/work/tk19812/software/FANTASIA/FANTASIA/venv/lib/python3.9/site-packages/bio_embeddings/utilities/cli.py", line 24, in main
    parse_config_file_and_execute_run(arguments.config_path[0], overwrite=arguments.overwrite)
  File "/user/work/tk19812/software/FANTASIA/FANTASIA/venv/lib/python3.9/site-packages/bio_embeddings/utilities/pipeline.py", line 349, in parse_config_file_and_execute_run
    _validate_file(config_file_path)
  File "/user/work/tk19812/software/FANTASIA/FANTASIA/venv/lib/python3.9/site-packages/bio_embeddings/utilities/pipeline.py", line 96, in _validate_file
    raise InvalidParameterError(f"The configuration file at '{file_path}' does not exist") from e
bio_embeddings.utilities.exceptions.InvalidParameterError: The configuration file at '/user/work/tk19812/software/FANTASIA/FANTASIA/config_files/embeddings/FIrstRun_prott5.yml' does not exist
Traceback (most recent call last):
  File "/user/work/tk19812/software/FANTASIA/FANTASIA/goPredSim/predict_go_embedding_inference.py", line 49, in <module>
    main()
  File "/user/work/tk19812/software/FANTASIA/FANTASIA/goPredSim/predict_go_embedding_inference.py", line 11, in main
    config_data = fu.read_config_file(config_file)
  File "/user/work/tk19812/software/FANTASIA/FANTASIA/goPredSim/file_utils.py", line 14, in read_config_file
    with open(file_in) as read_in:
FileNotFoundError: [Errno 2] No such file or directory: '/user/work/tk19812/software/FANTASIA/FANTASIA/config_files/gopredsim/FIrstRun_prott5.yml'

I'm also very confused as for example, I didn't understand where to give the actual fasta file... Cheers F

gmartinezredondo commented 8 months ago

First of all there "module load cesga/2020" which I dont really know that it is Sorry, that was my fault. I gzipped an old script with this line for executing it in our cluster. You can delete that line. I have updated the compressed file with the correct scripts so this won't happen to anyone else. In your case, to not download everything again, I can tell you that there were other 2 changes in the following scripts:

I didn't understand where to give the actual fasta file... You give the fasta file to the generate_gopredsim_input_file.sh script. This one generates configuration files with the paths to the input files that the launch_gopredsim_pipeline.sh script uses.

Have you checked if /user/work/tk19812/software/FANTASIA/FANTASIA/config_files/embeddings/FIrstRun_prott5.yml is created? or at least the directory /user/work/tk19812/software/FANTASIA/FANTASIA/config_files/. This is created when running the generate_gopredsim_input_file.sh script. If the files are not created correctly or there are some differences in the paths when you execute this script and the launch_gopredsim_pipeline.sh it will raise the error you are getting. What exact commands do you use when executing all scripts? This will tell me where the issue may be and how to help you.

francicco commented 8 months ago

Hi,

yes, FIrstRun_prott5.yml is not created.

So, as I said, I'm in the folder /user/work/tk19812/software/FANTASIA/FANTASIA/ and I execute:

./launch_gopredsim_pipeline.sh --model prott5 --prefix FIrstRun -c $PWD -o FirstRun

if this is not the first command I don't understand which one is. From your diagram it seems like the first one should be generate_gopredsim_input_files.sh

can you send me the list of commands that I should execute and maybe a small input file? Thanks a lot F

francicco commented 8 months ago

Hi Gemma,

I think I figured it out. It seems like it's working. I'll let you know as soon as it finishes

Thanks a lot F

gmartinezredondo commented 8 months ago

Hi, Francesco! It's great that you could figure it out. I was about to send you the steps. I will update the README when I have time to clarify the steps and how to run it. Thank you for the comments.

francicco commented 8 months ago

I've got this error now...

2024-03-13 16:34:52.832180: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2024-03-13 16:35:49,048 INFO Created the prefix directory Mbag_prott5
2024-03-13 16:35:49,050 INFO Created the file Mbag_prott5/input_parameters_file.yml
2024-03-13 16:35:51,296 INFO Created the file Mbag_prott5/sequences_file.fasta
2024-03-13 16:35:51,671 INFO Created the file Mbag_prott5/mapping_file.csv
2024-03-13 16:35:51,673 INFO Created the file Mbag_prott5/remapped_sequences_file.fasta
2024-03-13 16:35:51,965 INFO Created the stage directory Mbag_prott5/Mbag_prott5_embeddings
2024-03-13 16:35:51,966 INFO Created the file Mbag_prott5/Mbag_prott5_embeddings/input_parameters_file.yml
2024-03-13 16:36:36,622 INFO The minimum expected size for the reduced_embedding_file is 113.9 MB.
2024-03-13 16:36:36,622 INFO You are going to generate a total of 113.9 MB of embeddings, and have 2.5 PB available at Mbag_prott5.
2024-03-13 16:36:36,624 INFO Created the file Mbag_prott5/Mbag_prott5_embeddings/reduced_embeddings_file.h5
 12%|██████████████████████████████████████▍                                                                                                                                                                                                                                                                                     | 3381/27798 [16:46:13<60:00:06,  8.85s/it]100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 27797/27798 [31:52:39<00:04,  4.13s/it]
2024-03-15 00:29:16,279 INFO Created the file Mbag_prott5/Mbag_prott5_embeddings/ouput_parameters_file.yml
2024-03-15 00:29:16,303 INFO Created the file Mbag_prott5/ouput_parameters_file.yml
{'go': '/user/work/tk19812/software/FANTASIA/FANTASIA//goPredSim/data/GO/go_2022.obo', 'lookup_set': '/user/work/tk19812/software/FANTASIA/FANTASIA//goPredSim/data/prott5_goa_2022.h5', 'annotations': '/user/work/tk19812/software/FANTASIA/FANTASIA//goPredSim/data/goa_annotations/goa_annotations_2022.txt', 'targets': 'Mbagannotation/Mbag_prott5/Mbag_prott5_embeddings/reduced_embeddings_file.h5', 'onto': 'all', 'thresh': '1', 'modus': 'num', 'output': 'Mbagannotation/Mbag_prott5/gopredsim_Mbag_prott5'}
Traceback (most recent call last):
  File "/user/work/tk19812/software/FANTASIA/FANTASIA/goPredSim/predict_go_embedding_inference.py", line 49, in <module>
    main()
  File "/user/work/tk19812/software/FANTASIA/FANTASIA/goPredSim/predict_go_embedding_inference.py", line 15, in main
    test_embeddings = fu.read_embeddings(config_data['targets'])
  File "/user/work/tk19812/software/FANTASIA/FANTASIA/goPredSim/file_utils.py", line 48, in read_embeddings
    with h5py.File(embeddings_in, 'r') as f:
  File "/user/work/tk19812/software/FANTASIA/FANTASIA/venv/lib/python3.9/site-packages/h5py/_hl/files.py", line 507, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
  File "/user/work/tk19812/software/FANTASIA/FANTASIA/venv/lib/python3.9/site-packages/h5py/_hl/files.py", line 220, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 106, in h5py.h5f.open
FileNotFoundError: [Errno 2] Unable to open file (unable to open file: name = 'Mbagannotation/Mbag_prott5/Mbag_prott5_embeddings/reduced_embeddings_file.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
gmartinezredondo commented 8 months ago

Could you send me the commands you have used? For both generate_gopredsim_input_files.sh and launch_gopredsim_pipeline.sh. I suspect you have specified different output paths (-o/--outpath option) as arguments for both scripts when they should be identical.

francicco commented 8 months ago

Hi Genna,

The two script point at the same folder

#Especify config files (Use full paths, no variables)
EMBEDDINGS_CONFIG=$CONFIG/config_files/embeddings/${PREFIX}_${MODEL}.yml
GOPREDSIM_CONFIG=$CONFIG/config_files/gopredsim/${PREFIX}_${MODEL}.yml

#Activate conda environment

. ~/.bashrc
conda deactivate
conda activate gopredsim

#Change directory to the one specified in --outpath (embeddings are created in the current directory)
mkdir -p $OUT_PATH
cd $OUT_PATH

#Execute steps for ProTT5 model
if [[ "$MODEL" = "prott5" ]]; then
    #Activate prott5 python environment
    source /user/work/tk19812/software/FANTASIA/FANTASIA/venv/bin/activate
    #Compute embeddings with the desired model
    /user/work/tk19812/software/FANTASIA/FANTASIA/launch_embeddings.sh -f $EMBEDDINGS_CONFIG
    #Transfer GO annotation using GOPredSim
    /user/work/tk19812/software/FANTASIA/FANTASIA/launch_gopredsim.sh -f $GOPREDSIM_CONFIG
    #Deactivate python environment
    deactivate
fi

The problem is predict_go_embedding_inference.py

the error:

FileNotFoundError: [Errno 2] Unable to open file (unable to open file: name = 'Mbagannotation/Mbag_prott5/Mbag_prott5_embeddings/reduced_embeddings_file.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

is actually not correct. What I mean is that the file is there, it just cannot be open by predict_go_embedding_inference.py and I don't know why.

F