Open francicco opened 8 months ago
Hi, Francesco!
There is no executable created when installing it manually. That is actually a good idea to implement. I may try it when I have some time.
Regarding launch_gopredsim.sh
, could you send me the error you're getting?
Best,
Gemma
Hi Gemma,
Thanks you for the quick reply. This is the script.
#!/bin/bash
while getopts f: flag
do
case "${flag}" in
f) filepath=${OPTARG};;
esac
done
python /user/work/tk19812/software/FANTASIA/FANTASIA/goPredSim/predict_go_embedding_inference.py $filepath
I had to change the original one because it was pointing the FANTASIA
dir instead of goPredSim
where predict_go_embedding_inference.py
actually is.
If I execute it:
./launch_gopredsim.sh file.fasta
I get:
Traceback (most recent call last):
File "/user/work/tk19812/software/FANTASIA/FANTASIA/goPredSim/predict_go_embedding_inference.py", line 49, in <module>
main()
File "/user/work/tk19812/software/FANTASIA/FANTASIA/goPredSim/predict_go_embedding_inference.py", line 10, in main
config_file = sys.argv[1]
IndexError: list index out of range
Hi, Francesco!
I had to change the original one because it was pointing the FANTASIA dir instead of goPredSim where predict_go_embedding_inference.py actually is. I changed it already in the compressed file with the code, so this should not happen again.
Have you executed the generate_gopredsim_input_file.sh
script?
I believe the problem is that you haven't told the script where the configuration files (generated by generate_gopredsim_input_file.sh
) are. You can run launch_gopredsim.sh -h
to see the required input files and how to execute it. If you told the generate_gopredsim_input_file.sh
script that your configuration files should go to folder X, you must put the same when executing launch_gopredsim.sh
. Otherwise, it will assume that it is in the same directory, in a folder called config_files
.
Hi Gemma,
I don't quite understand how the pipeline works. As you suggested I executed ./launch_gopredsim.sh -h
but I've got
./launch_gopredsim.sh: illegal option -- h
Traceback (most recent call last):
File "/user/work/tk19812/software/FANTASIA/FANTASIA/goPredSim/predict_go_embedding_inference.py", line 49, in <module>
main()
File "/user/work/tk19812/software/FANTASIA/FANTASIA/goPredSim/predict_go_embedding_inference.py", line 10, in main
config_file = sys.argv[1]
IndexError: list index out of range
What am I doing wrong? Cheers F
Oh, I see the problem now. My bad for not seeing it before. You need to execute launch_gopredsim_pipeline.sh
, not launch_gopredsim.sh
.
Yeah, I also tried that one, but I have errors there as well.
First of all there "module load cesga/2020" which I dont really know that it is, then I substituted the other such as cuda with mime
Then on line 118 there as cd $OUT_PATH
which fails because the directory wasn't created yet.
Once I fixed these small things I still get:
./launch_gopredsim_pipeline.sh --model prott5 --prefix FIrstRun -c $PWD -o FirstRun
The following have been reloaded with a version change: 1) lang/gcc/7.5.0 => lang/gcc/9.1.0 2) lang/python/anaconda/3.10.4-2021-11-fencis => lang/python/anaconda/3.9.7-2021.12-tensorflow.2.7.0
Traceback (most recent call last): File "/user/work/tk19812/software/FANTASIA/FANTASIA/venv/lib/python3.9/site-packages/bio_embeddings/utilities/pipeline.py", line 93, in _validate_file if os.stat(file_path).st_size == 0: FileNotFoundError: [Errno 2] No such file or directory: '/user/work/tk19812/software/FANTASIA/FANTASIA/config_files/embeddings/FIrstRun_prott5.yml'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/user/work/tk19812/software/FANTASIA/FANTASIA/venv/bin/bio_embeddings", line 8, in <module>
sys.exit(main())
File "/user/work/tk19812/software/FANTASIA/FANTASIA/venv/lib/python3.9/site-packages/bio_embeddings/utilities/cli.py", line 24, in main
parse_config_file_and_execute_run(arguments.config_path[0], overwrite=arguments.overwrite)
File "/user/work/tk19812/software/FANTASIA/FANTASIA/venv/lib/python3.9/site-packages/bio_embeddings/utilities/pipeline.py", line 349, in parse_config_file_and_execute_run
_validate_file(config_file_path)
File "/user/work/tk19812/software/FANTASIA/FANTASIA/venv/lib/python3.9/site-packages/bio_embeddings/utilities/pipeline.py", line 96, in _validate_file
raise InvalidParameterError(f"The configuration file at '{file_path}' does not exist") from e
bio_embeddings.utilities.exceptions.InvalidParameterError: The configuration file at '/user/work/tk19812/software/FANTASIA/FANTASIA/config_files/embeddings/FIrstRun_prott5.yml' does not exist
Traceback (most recent call last):
File "/user/work/tk19812/software/FANTASIA/FANTASIA/goPredSim/predict_go_embedding_inference.py", line 49, in <module>
main()
File "/user/work/tk19812/software/FANTASIA/FANTASIA/goPredSim/predict_go_embedding_inference.py", line 11, in main
config_data = fu.read_config_file(config_file)
File "/user/work/tk19812/software/FANTASIA/FANTASIA/goPredSim/file_utils.py", line 14, in read_config_file
with open(file_in) as read_in:
FileNotFoundError: [Errno 2] No such file or directory: '/user/work/tk19812/software/FANTASIA/FANTASIA/config_files/gopredsim/FIrstRun_prott5.yml'
I'm also very confused as for example, I didn't understand where to give the actual fasta file... Cheers F
First of all there "module load cesga/2020" which I dont really know that it is Sorry, that was my fault. I gzipped an old script with this line for executing it in our cluster. You can delete that line. I have updated the compressed file with the correct scripts so this won't happen to anyone else. In your case, to not download everything again, I can tell you that there were other 2 changes in the following scripts:
I didn't understand where to give the actual fasta file... You give the fasta file to the
generate_gopredsim_input_file.sh
script. This one generates configuration files with the paths to the input files that thelaunch_gopredsim_pipeline.sh
script uses.
Have you checked if /user/work/tk19812/software/FANTASIA/FANTASIA/config_files/embeddings/FIrstRun_prott5.yml
is created? or at least the directory /user/work/tk19812/software/FANTASIA/FANTASIA/config_files/
. This is created when running the generate_gopredsim_input_file.sh
script. If the files are not created correctly or there are some differences in the paths when you execute this script and the launch_gopredsim_pipeline.sh
it will raise the error you are getting.
What exact commands do you use when executing all scripts? This will tell me where the issue may be and how to help you.
Hi,
yes, FIrstRun_prott5.yml
is not created.
So, as I said, I'm in the folder /user/work/tk19812/software/FANTASIA/FANTASIA/
and I execute:
./launch_gopredsim_pipeline.sh --model prott5 --prefix FIrstRun -c $PWD -o FirstRun
if this is not the first command I don't understand which one is. From your diagram it seems like the first one should be generate_gopredsim_input_files.sh
can you send me the list of commands that I should execute and maybe a small input file? Thanks a lot F
Hi Gemma,
I think I figured it out. It seems like it's working. I'll let you know as soon as it finishes
Thanks a lot F
Hi, Francesco! It's great that you could figure it out. I was about to send you the steps. I will update the README when I have time to clarify the steps and how to run it. Thank you for the comments.
I've got this error now...
2024-03-13 16:34:52.832180: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2024-03-13 16:35:49,048 INFO Created the prefix directory Mbag_prott5
2024-03-13 16:35:49,050 INFO Created the file Mbag_prott5/input_parameters_file.yml
2024-03-13 16:35:51,296 INFO Created the file Mbag_prott5/sequences_file.fasta
2024-03-13 16:35:51,671 INFO Created the file Mbag_prott5/mapping_file.csv
2024-03-13 16:35:51,673 INFO Created the file Mbag_prott5/remapped_sequences_file.fasta
2024-03-13 16:35:51,965 INFO Created the stage directory Mbag_prott5/Mbag_prott5_embeddings
2024-03-13 16:35:51,966 INFO Created the file Mbag_prott5/Mbag_prott5_embeddings/input_parameters_file.yml
2024-03-13 16:36:36,622 INFO The minimum expected size for the reduced_embedding_file is 113.9 MB.
2024-03-13 16:36:36,622 INFO You are going to generate a total of 113.9 MB of embeddings, and have 2.5 PB available at Mbag_prott5.
2024-03-13 16:36:36,624 INFO Created the file Mbag_prott5/Mbag_prott5_embeddings/reduced_embeddings_file.h5
12%|██████████████████████████████████████▍ | 3381/27798 [16:46:13<60:00:06, 8.85s/it]100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 27797/27798 [31:52:39<00:04, 4.13s/it]
2024-03-15 00:29:16,279 INFO Created the file Mbag_prott5/Mbag_prott5_embeddings/ouput_parameters_file.yml
2024-03-15 00:29:16,303 INFO Created the file Mbag_prott5/ouput_parameters_file.yml
{'go': '/user/work/tk19812/software/FANTASIA/FANTASIA//goPredSim/data/GO/go_2022.obo', 'lookup_set': '/user/work/tk19812/software/FANTASIA/FANTASIA//goPredSim/data/prott5_goa_2022.h5', 'annotations': '/user/work/tk19812/software/FANTASIA/FANTASIA//goPredSim/data/goa_annotations/goa_annotations_2022.txt', 'targets': 'Mbagannotation/Mbag_prott5/Mbag_prott5_embeddings/reduced_embeddings_file.h5', 'onto': 'all', 'thresh': '1', 'modus': 'num', 'output': 'Mbagannotation/Mbag_prott5/gopredsim_Mbag_prott5'}
Traceback (most recent call last):
File "/user/work/tk19812/software/FANTASIA/FANTASIA/goPredSim/predict_go_embedding_inference.py", line 49, in <module>
main()
File "/user/work/tk19812/software/FANTASIA/FANTASIA/goPredSim/predict_go_embedding_inference.py", line 15, in main
test_embeddings = fu.read_embeddings(config_data['targets'])
File "/user/work/tk19812/software/FANTASIA/FANTASIA/goPredSim/file_utils.py", line 48, in read_embeddings
with h5py.File(embeddings_in, 'r') as f:
File "/user/work/tk19812/software/FANTASIA/FANTASIA/venv/lib/python3.9/site-packages/h5py/_hl/files.py", line 507, in __init__
fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
File "/user/work/tk19812/software/FANTASIA/FANTASIA/venv/lib/python3.9/site-packages/h5py/_hl/files.py", line 220, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 106, in h5py.h5f.open
FileNotFoundError: [Errno 2] Unable to open file (unable to open file: name = 'Mbagannotation/Mbag_prott5/Mbag_prott5_embeddings/reduced_embeddings_file.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
Could you send me the commands you have used? For both generate_gopredsim_input_files.sh
and launch_gopredsim_pipeline.sh
. I suspect you have specified different output paths (-o/--outpath option) as arguments for both scripts when they should be identical.
Hi Genna,
The two script point at the same folder
#Especify config files (Use full paths, no variables)
EMBEDDINGS_CONFIG=$CONFIG/config_files/embeddings/${PREFIX}_${MODEL}.yml
GOPREDSIM_CONFIG=$CONFIG/config_files/gopredsim/${PREFIX}_${MODEL}.yml
#Activate conda environment
. ~/.bashrc
conda deactivate
conda activate gopredsim
#Change directory to the one specified in --outpath (embeddings are created in the current directory)
mkdir -p $OUT_PATH
cd $OUT_PATH
#Execute steps for ProTT5 model
if [[ "$MODEL" = "prott5" ]]; then
#Activate prott5 python environment
source /user/work/tk19812/software/FANTASIA/FANTASIA/venv/bin/activate
#Compute embeddings with the desired model
/user/work/tk19812/software/FANTASIA/FANTASIA/launch_embeddings.sh -f $EMBEDDINGS_CONFIG
#Transfer GO annotation using GOPredSim
/user/work/tk19812/software/FANTASIA/FANTASIA/launch_gopredsim.sh -f $GOPREDSIM_CONFIG
#Deactivate python environment
deactivate
fi
The problem is predict_go_embedding_inference.py
the error:
FileNotFoundError: [Errno 2] Unable to open file (unable to open file: name = 'Mbagannotation/Mbag_prott5/Mbag_prott5_embeddings/reduced_embeddings_file.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
is actually not correct. What I mean is that the file is there, it just cannot be open by predict_go_embedding_inference.py
and I don't know why.
F
Hi,
I wanted to test your software, but I found some problems during the installation. I don't have singularity on my cluster, and I can't install it, so I'm trying to install it manually.
I followed the
installation_guide_FANTASIA.sh
, it seems successfully but can't see thefantasia
executable. I also tried to executelaunch_gopredsim.sh
and I wasn't lucky either.What am I doing wrong? any help? Cheers Francesco