Open EricDeveaud opened 2 years ago
Hi Eric,
the uniprot taxonomy is only needed in very special config settings, e.g. with DoHierarchicalGroups := 'top-down';
. So usually this is not needed. If you want to make sure that OmaStandalone is able to run without access to the internet in any configuration, you can download and convert the taxonomy with the following command:
bin/omadarwin -E << EOF
datadirname := getenv('HOME').'/.cache/oma2';
CallSystem('mkdir -p '.datadirname);
GOdownload();
TaxonomyDownload();
EOF
This should create all the necessary files in the ~/.cache/oma
folder of the current user (Gene Ontology and UniProt Taxonomy).
Cheers Adrian
Hi Adiran.
many thanks for the input.
best regards
Eric
hi Adiran. works well I was abble to get the necessary files thanks again.
I have few more question
1) from ToyExample/parameters.drw
one can read:
# Folder where auxillary data (e.g. GeneOntology definitions, etc)
# will be stored. The folder must be writable by the user. If not set
# or commented, the default will be ~/.cache/oma/
AuxDataPath := 'data/';
if I understand right when AuxDataPath
is set on parameters.drw
file it superseed datadirname
set on $omadir/darwinlib/darwinit
is this right ?
2) and when it is said The folder must be writable by the user.
is there any other files than GOdata.drw.gz
and UniProtTaxonomy.drw.gz
that will be stored to this directory ?
I ask because the installation scheme on our cluster is done on Read Only shared file system, So i must be sure that I can host the files on this one.
If not I will have to provide some solution for users to be abble to store the required files
best regards
Eric
Hi Eric,
indeed, when you set AuxDataPath in the parameter file, this superseeds the default datadirname. The two files (and two symlinks) are the only files that are used from this folder. So in principle I think it would be ok to set the an absolute path for AuxDataPath
in the parameters.drw file in the installation folder. when users generate a new parameter file for their project with oma -p
, that path will already be set and used.
However, maybe it would be more sensible to have an environment variable that can be set as default. then, we could have set the path to these auxiliary data like:
AuxDataPath
parameter)would that make sense from your point and would simplify setting up the package on an HPC system?
Cheers Adrian
thanks Adrian, I endend with the same schema that you describe.
the "default" parametes.drw
I provide have the following AuxDataPah set like this.
AuxDataPath := getenv('OMA_DATA');
and on OMA_DATA path we provide the GOdata.drw.gz
and UniProtTaxonomy.drw.gz
files
and it seems to work
can you provide me some information about the ftp://ftp.uniprot.org/pub/databases/uniprot/knowledgebase/docs/speclist.txt
url used in darwinlib/TaxTools
library ?
and finaly are you the author of Darwin ?
I would suggest to embed a private copy of darwin libs where GetTmpDir()
from Wrappers/Common
instead is used instead of having '/tmp/
hardcoded on multiples places.
many cluster out there set a TMPDIR
environment variable that points to fast scratch location instead of usual /tmp
best regards
Eric
Hi Eric,
yes, that seems like a good setup.
regarding your darwin questions: yes, I am a co-author of that language. The darwinlib/TaxTools functionality isn't needed by OmaStandalone at all, so you won't need to download that data.
about hardcoded /tmp dir - where did you find that? I don't think that this is used anywhere. The GetTmpDir() function actually already uses the TMPDIR
environment variable...
Adrian,
thanks for your feedback
regarding the use of hardcoded /tmp
in darwinlib
you may find it just by doing
rpm_maker:src/OMA > wget -q https://omabrowser.org/standalone/OMA.2.5.0.tgz
rpm_maker:src/OMA > tar xf OMA.2.5.0.tgz
rpm_maker:src/OMA > cd OMA.2.5.0/darwinlib/
rpm_maker:OMA.2.5.0/darwinlib > grep -Rl '/tmp'
Wrappers/Common
FigPlot
Plot2Gif
ParExecSlave
FileConv
Descriptions
Server/MassDynSearch
Server/TreeGen
Server/MassSearch
Server/TreeConstruction
Server/AllAll
Server/PepPepSearch
Server/TestNewFunction1
Server/MultAlign
Server/cbrg.server
Server/Gendb
Server/AllAllDB
Server/TestNewFunction
Server/mail_handler
Server/PredictGenes
Server/NuclPepSearch
Server/EvolutionaryAnalysis
Ontology
ParExec2
MBA_Toolkit
Taxonomy
MySQL
IPC
DBTools
HelpText.txt
I guess some of this library files are not used by OMA. but some are ;-)
regards
Eric
Hi Eric,
indeed, there are quite a few places in the darwinlib, but in OmaStandalone, only the function in Taxonomy
and Ontology
are used. I will make an attempt to update these functions before the next OmaStandalone release. Thanks for your valuable feedback!
Best wishes Adrian
Hello, our cluster compute nodes does not have access to internet, so oma fails while trying to download at first http://purl.obolibrary.org/obo/go.obo
I may execute a run on a machine that have internet access and provide the
$HOME/.cache/oma/GOdata.drw
or for our usersbut I saw that
darwinlib/Taxonomy
also perform a download fromhttp://www.uniprot.org/taxonomy/?query=*&compress=yes&format=tab
is there a way that I can download and process (ConvertRawFile) this file and provide the resultingUniProtTaxonomy.drw
file to our users in order to be abble to run oma without internet access. this way oma will be really Standalone ;-) regardsEric
edit typo