Closed mlhoggard closed 2 years ago
Ummm. Yes, currently the CNN model can only be run in the given model because the path of the CNN script is hardcoded.
If you are using an HPC, you can sbatch your job under the HostG folder. That will work.
Best, Jiayu
Hi Jiayu,
Thanks for the quick reply.
Ok, I'll continue running from within the HostG/
directory, but if possible for a future update that would be great thanks, as it's currently a bit problematic on a shared system, as only one dataset can be run at a time and the directory for the program itself also gets a bit jumbled intermingled with all of the output files.
I had a quick look at if I could patch this quickly for our system, but as there's a number of subscripts with the dataset/
directory and all output directories and paths hardcoded it got a bit trickier for me not being fully familiar with how all of the scripts interrelate. But if of interest, some possible modifications could include:
run_Speed_up.py
path to then include in front of all calls to other scripts. E.g. possibly via something like: HostG_path = str(os.path.dirname(os.path.abspath(sys.argv[0])))
, followed by subsequent calls in the format cmd = str(HostG_path)+"/run_CNN.py"
(n.b. python
call dropped assuming note 3 below).dataset/
directory path. (e.g. from what I could tell, dataset/
is currently hardcoded in run_CNN.py
, run_KnowledgeGraph.py
, run_phage_host.py
, and run_phage_phage.py
.)#!/usr/bin/env python3
to the top of each of the python scripts would also enable all calls within the scripts to omit the direct call to python
(e.g. updated to the format cmd = "run_CNN.py"
). This would also enable searching $PATH
for the scripts rather than only the working directory (although, being able to search $PATH
is less necessary for all of the subscripts if note 1 above was implemented, but would still be very useful for run_Speed_up.py
). dataset/
directory, with a default setting of whatever the full path to run_Speed_up.py
is. (i.e. Additional argument added to run_Speed_up.py
for the database, the path of which is then passed to whichever other scripts require the database path). This would allow for both running the program from somewhere other than the HostG/
directory, and would also allow for storing the database in a separate directory from the program (as might be preferable in some instances).Thanks again for the reply. I had a quick follow up question regarding outputting the threshold scores, but I will open a new thread for that one.
Kind regards, Mike.
Hi there,
Thanks for your work on HostG. It looks like a great tool, and I'm keen to give it a test run with some current data we're working on.
I just wanted to check if I'm missing something, but from what I can tell, is it currently necessary to run the tool from within the
HostG/
directory?I.e. The current calls to other python scripts in the format
cmd = "python run_CNN.py"
appear to only look within the working directory forrun_CNN.py
even ifHostG/
is added to$PATH
. Similarly, the call to the database (dataset/
) appears to be hardcoded as being within the current directory (e.g. withinrun_KnowledgeGraph.py
:pkl.load(open("dataset/phage2id.dict",'rb'))
).Thanks again for all your work on this, and I'm looking forward to seeing how the outputs from our data look.
Kind regards, Mike.
(p.s. I installed simply by cloning the repo rather than via anaconda, but perhaps it was written with the assumption that it only be run from directly within the conda environment?)