EBISPOT / OXO

Ontology cross-reference and mapping service
Apache License 2.0
17 stars 7 forks source link

Docs: OlsDatasetExtractor: CSV not generated #62

Open joeflack4 opened 3 years ago

joeflack4 commented 3 years ago

Description

I'm trying to learn about OXO and I have set up a local instance using docker. I'm following the docs, and I want to create datasets.csv and follow the rest of the documentation walkthrough. But either (a) the file is not being generated, or (b) I can't find it.

This is the command that I'm running:

docker run --net=host -v $(pwd)/config.ini:/mnt/config.ini -v $(pwd)/idorg.xml:/mnt/idorg.xml \
    -v oxo-neo4j-import:/mnt/neo4j -it ebispot/oxo-loader:stable \
        python /opt/oxo-loader/OlsDatasetExtractor.py \
            -c /mnt/config.ini -i /mnt/idorg.xml -d /mnt/neo4j/datasources.csv

Expected behavior

After running command, a CSV file should be created within the docker container oxo_oxo-web-1 at the path /mnt/neo4j/datasources.csv.

Actual behavior

After running the command, no such file is found at /mnt/neo4j/datasources.csv is not found in that container. The /mnt directory exists, but /mnt/neo4j/ directory does not exist.

Additional information

I simply followed the documentation (https://github.com/EBISPOT/OXO) prior to running this command. I had done the following:

git clone https://github.com/EBISPOT/OXO.git
cd OXO

docker volume create --name=oxo-neo4j-data
docker volume create --name=oxo-neo4j-import
docker volume create --name=oxo-mongo-data
docker volume create --name=oxo-solr-data
docker volume create --name=oxo-hsqldb

docker-compose up -d   # docs left out the '-d' part, but I added it

cd oxo-loader

# from here, I continued w/ the documentation at: https://github.com/EBISPOT/OXO/tree/main/oxo-loader
# then I ran the command mentioned above in "Description"

Neo4j and the other containers are up and running and appear to be healthy.

Screen Shot 2021-09-28 at 7 36 10 PM

My config.ini (I didn't create this; it was already there)

[Basics]
oxoUrl=http://host.docker.internal:8080
oxoAPIkey=key
olsSolrBaseUrl=http://host.docker.internal:8993/solr
solrChunks=5000
neoURL=bolt://host.docker.internal:7687
neoUser=neo4j
neoPass=dba
olsurl=http://www.ebi.ac.uk/ols/api
oboDbxrefUrl=https://raw.githubusercontent.com/geneontology/go-site/master/metadata/db-xrefs.yaml

[Paths]
exportFileDatasources=datasources.csv
exportFileTerms=/path/terms.csv
exportFileMappings=/path/mappings.csv
idorgDataLocation = /path/idorg.xml

[SQLumls]
user=username
password=password
host=mysql-name
db=dbName
port=4570

[LOINC]
Part=/path/Part.csv
PartRelatedCodeMapping=/path/PartRelatedCodeMapping.csv

Command output

pwd
/Users/joeflack4/projects/OXO/oxo-loader
docker run --net=host -v $(pwd)/config.ini:/mnt/config.ini -v $(pwd)/idorg.xml:/mnt/idorg.xml \
    -v oxo-neo4j-import:/mnt/neo4j -it ebispot/oxo-loader:stable \
        python /opt/oxo-loader/OlsDatasetExtractor.py \
            -c /mnt/config.ini -i /mnt/idorg.xml -d /mnt/neo4j/datasources.csv

Ignoring chebi from idorg as it is already registered as a datasource
Ignoring go from idorg as it is already registered as a datasource
Ignoring geo from idorg as it is already registered as a datasource
Ignoring eco from idorg as it is already registered as a datasource
Ignoring pride from idorg as it is already registered as a datasource
Ignoring fma from idorg as it is already registered as a datasource
Ignoring so from idorg as it is already registered as a datasource
Ignoring biomodels.teddy from idorg as it is already registered as a datasource
Ignoring biomodels.kisao from idorg as it is already registered as a datasource
Ignoring cl from idorg as it is already registered as a datasource
Ignoring bto from idorg as it is already registered as a datasource
Ignoring pato from idorg as it is already registered as a datasource
Ignoring ro from idorg as it is already registered as a datasource
Ignoring obi from idorg as it is already registered as a datasource
Ignoring ncit from idorg as it is already registered as a datasource
Ignoring pr from idorg as it is already registered as a datasource
Ignoring edam from idorg as it is already registered as a datasource
Ignoring orphanet from idorg as it is already registered as a datasource
Ignoring doid from idorg as it is already registered as a datasource
Ignoring cco from idorg as it is already registered as a datasource
Ignoring pw from idorg as it is already registered as a datasource
Ignoring po from idorg as it is already registered as a datasource
Ignoring efo from idorg as it is already registered as a datasource
Ignoring vario from idorg as it is already registered as a datasource
Ignoring ma from idorg as it is already registered as a datasource
Ignoring uberon from idorg as it is already registered as a datasource
Ignoring unimod from idorg as it is already registered as a datasource
Ignoring mamo from idorg as it is already registered as a datasource
Ignoring hpo from idorg as it is already registered as a datasource
Ignoring probonto from idorg as it is already registered as a datasource
/opt/oxo-loader/OlsDatasetExtractor.py:143: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  yamlData = yaml.load(urllib.request.urlopen(oboDbxrefUrl))
New datasource AgBase from GO db-xrefs file
New datasource AGI_LocusCode from GO db-xrefs file
New datasource AGRICOLA_ID from GO db-xrefs file
New datasource AGRICOLA_IND from GO db-xrefs file
New datasource Alzheimers_University_of_Toronto from GO db-xrefs file
New datasource ApiDB_PlasmoDB from GO db-xrefs file
New datasource APweb from GO db-xrefs file
New datasource ARUK-UCL from GO db-xrefs file
Ignoring ASAP from OBO as it is already registered as a datasource
New datasource AspGD from GO db-xrefs file
New datasource AspGD_LOCUS from GO db-xrefs file
New datasource AspGD_REF from GO db-xrefs file
Ignoring BFO from OBO as it is already registered as a datasource
New datasource BHF-UCL from GO db-xrefs file
Ignoring BioCyc from OBO as it is already registered as a datasource
New datasource BIOMD from GO db-xrefs file
New datasource bioRxiv from GO db-xrefs file
Ignoring BRENDA from OBO as it is already registered as a datasource
Ignoring BTO from OBO as it is already registered as a datasource
New datasource CACAO from GO db-xrefs file
New datasource CAFA from GO db-xrefs file
Ignoring CARO from OBO as it is already registered as a datasource
Ignoring CAS from OBO as it is already registered as a datasource
New datasource CASGEN from GO db-xrefs file
New datasource CASREF from GO db-xrefs file
New datasource CASSPC from GO db-xrefs file
Ignoring CAZY from OBO as it is already registered as a datasource
New datasource DTU from GO db-xrefs file
Ignoring CDD from OBO as it is already registered as a datasource
Ignoring CGD from OBO as it is already registered as a datasource
New datasource CGD_LOCUS from GO db-xrefs file
New datasource CGD_REF from GO db-xrefs file
Ignoring CGSC from OBO as it is already registered as a datasource
Ignoring CHEBI from OBO as it is already registered as a datasource
Ignoring CL from OBO as it is already registered as a datasource
New datasource CO_125 from GO db-xrefs file
New datasource COG from GO db-xrefs file
New datasource COG_Cluster from GO db-xrefs file
New datasource COG_Function from GO db-xrefs file
New datasource COG_Pathway from GO db-xrefs file
New datasource CollecTF from GO db-xrefs file
New datasource ComplexPortal from GO db-xrefs file
Ignoring CORIELL from OBO as it is already registered as a datasource
Ignoring CORUM from OBO as it is already registered as a datasource
New datasource cribi_vitis from GO db-xrefs file
Ignoring dbSNP from OBO as it is already registered as a datasource
Ignoring DDANAT from OBO as it is already registered as a datasource
New datasource DDBJ from GO db-xrefs file
New datasource dictyBase from GO db-xrefs file
New datasource dictyBase_gene_name from GO db-xrefs file
New datasource dictyBase_REF from GO db-xrefs file
Ignoring DOI from OBO as it is already registered as a datasource
New datasource EC from GO db-xrefs file
Ignoring EchoBASE from OBO as it is already registered as a datasource
Ignoring ECO from OBO as it is already registered as a datasource
New datasource EcoCyc from GO db-xrefs file
New datasource EcoCyc_REF from GO db-xrefs file
Ignoring EcoliWiki from OBO as it is already registered as a datasource
Ignoring EMAPA from OBO as it is already registered as a datasource
New datasource EMBL from GO db-xrefs file
Ignoring ENA from OBO as it is already registered as a datasource
Ignoring ENSEMBL from OBO as it is already registered as a datasource
New datasource ENSEMBL_GeneID from GO db-xrefs file
New datasource ENSEMBL_ProteinID from GO db-xrefs file
New datasource ENSEMBL_TranscriptID from GO db-xrefs file
New datasource EnsemblFungi from GO db-xrefs file
New datasource EnsemblMetazoa from GO db-xrefs file
New datasource EnsemblPlants from GO db-xrefs file
New datasource EnsemblProtists from GO db-xrefs file
New datasource ENZYME from GO db-xrefs file
New datasource EO_GIT from GO db-xrefs file
New datasource EuPathDB from GO db-xrefs file
New datasource Eurofung from GO db-xrefs file
New datasource FB from GO db-xrefs file
Ignoring FBbt from OBO as it is already registered as a datasource
Ignoring FMA from OBO as it is already registered as a datasource
Ignoring FYPO from OBO as it is already registered as a datasource
New datasource GenBank from GO db-xrefs file
New datasource Gene3D from GO db-xrefs file
Ignoring GeneDB from OBO as it is already registered as a datasource
New datasource Genesys-pgr from GO db-xrefs file
Ignoring GEO from OBO as it is already registered as a datasource
Ignoring GO from OBO as it is already registered as a datasource
New datasource GO_Central from GO db-xrefs file
New datasource GO_Noctua from GO db-xrefs file
New datasource GO_REF from GO db-xrefs file
New datasource gomodel from GO db-xrefs file
New datasource GOC from GO db-xrefs file
New datasource GOC-OWL from GO db-xrefs file
New datasource GONUTS from GO db-xrefs file
New datasource GOREL from GO db-xrefs file
New datasource GR from GO db-xrefs file
New datasource GR_Ensembl from GO db-xrefs file
New datasource GR_GENE from GO db-xrefs file
New datasource GR_MUT from GO db-xrefs file
New datasource GR_PROTEIN from GO db-xrefs file
New datasource GR_QTL from GO db-xrefs file
New datasource GR_REF from GO db-xrefs file
New datasource GRIN from GO db-xrefs file
New datasource GRINDesc from GO db-xrefs file
New datasource H-invDB from GO db-xrefs file
New datasource H-invDB_cDNA from GO db-xrefs file
New datasource H-invDB_locus from GO db-xrefs file
Ignoring HAMAP from OBO as it is already registered as a datasource
Ignoring HGNC from OBO as it is already registered as a datasource
Ignoring HPA from OBO as it is already registered as a datasource
New datasource HPA_antibody from GO db-xrefs file
New datasource HUGO from GO db-xrefs file
Ignoring IAO from OBO as it is already registered as a datasource
New datasource IMG from GO db-xrefs file
New datasource IMGT_HLA from GO db-xrefs file
New datasource IMGT_LIGM from GO db-xrefs file
Ignoring IntAct from OBO as it is already registered as a datasource
Ignoring InterPro from OBO as it is already registered as a datasource
New datasource iPTMnet from GO db-xrefs file
New datasource IRIC from GO db-xrefs file
New datasource IRGC from GO db-xrefs file
Ignoring ISBN from OBO as it is already registered as a datasource
Ignoring ISSN from OBO as it is already registered as a datasource
New datasource IUPHAR/BPS from GO db-xrefs file
New datasource IUPHAR_GPCR from GO db-xrefs file
New datasource IUPHAR_RECEPTOR from GO db-xrefs file
New datasource Jaiswal_Lab from GO db-xrefs file
Ignoring TIGRFAMS from OBO as it is already registered as a datasource
Ignoring JSTOR from OBO as it is already registered as a datasource
New datasource KEGG from GO db-xrefs file
New datasource KEGG_ENZYME from GO db-xrefs file
New datasource KEGG_LIGAND from GO db-xrefs file
New datasource KEGG_PATHWAY from GO db-xrefs file
New datasource KEGG_REACTION from GO db-xrefs file
New datasource LIFEdb from GO db-xrefs file
Ignoring MA from OBO as it is already registered as a datasource
New datasource MACSC_REF from GO db-xrefs file
New datasource MaizeGDB from GO db-xrefs file
New datasource MaizeGDB_Locus from GO db-xrefs file
New datasource MaizeGDB_QTL from GO db-xrefs file
New datasource MaizeGDB_REF from GO db-xrefs file
New datasource MaizeGDB_stock from GO db-xrefs file
New datasource MEDLINE from GO db-xrefs file
Ignoring MEROPS from OBO as it is already registered as a datasource
New datasource MEROPS_fam from GO db-xrefs file
Ignoring MeSH from OBO as it is already registered as a datasource
New datasource MetaCyc from GO db-xrefs file
New datasource MGCSC_GENETIC_STOCKS from GO db-xrefs file
Ignoring MGD from OBO as it is already registered as a datasource
New datasource MGI from GO db-xrefs file
New datasource MIPS_funcat from GO db-xrefs file
New datasource MITRE from GO db-xrefs file
New datasource ModBase from GO db-xrefs file
New datasource NASC_code from GO db-xrefs file
New datasource NC-IUBMB from GO db-xrefs file
New datasource NCBI from GO db-xrefs file
Ignoring NCBIGene from OBO as it is already registered as a datasource
New datasource NCBI_gi from GO db-xrefs file
New datasource NCBI_GP from GO db-xrefs file
New datasource NCBI_locus_tag from GO db-xrefs file
New datasource NCBI_NP from GO db-xrefs file
New datasource NIF_Subcellular from GO db-xrefs file
New datasource NTNU_SB from GO db-xrefs file
Ignoring OBI from OBO as it is already registered as a datasource
New datasource OBO_SF_PO from GO db-xrefs file
New datasource OBO_SF2_PO from GO db-xrefs file
New datasource OBO_SF2_PECO from GO db-xrefs file
Ignoring OMIM from OBO as it is already registered as a datasource
New datasource OMSSA from GO db-xrefs file
New datasource PANTHER from GO db-xrefs file
New datasource ParkinsonsUK-UCL from GO db-xrefs file
Ignoring PATO from OBO as it is already registered as a datasource
New datasource PATRIC from GO db-xrefs file
Ignoring PDB from OBO as it is already registered as a datasource
New datasource PECO_GIT from GO db-xrefs file
Ignoring Pfam from OBO as it is already registered as a datasource
New datasource PharmGKB from GO db-xrefs file
New datasource PhenoScape from GO db-xrefs file
New datasource PIR from GO db-xrefs file
Ignoring PIRSF from OBO as it is already registered as a datasource
New datasource PlantSystematics_image_archive from GO db-xrefs file
New datasource PMCID from GO db-xrefs file
New datasource PMID from GO db-xrefs file
Ignoring PO from OBO as it is already registered as a datasource
New datasource PO_GIT from GO db-xrefs file
New datasource PO_REF from GO db-xrefs file
New datasource POC from GO db-xrefs file
Ignoring PomBase from OBO as it is already registered as a datasource
New datasource Pompep from GO db-xrefs file
New datasource PPI from GO db-xrefs file
Ignoring PR from OBO as it is already registered as a datasource
Ignoring PRINTS from OBO as it is already registered as a datasource
Ignoring Prosite from OBO as it is already registered as a datasource
New datasource protein_id from GO db-xrefs file
New datasource PseudoCAP from GO db-xrefs file
New datasource PSO_GIT from GO db-xrefs file
New datasource PSI-MI from GO db-xrefs file
New datasource PSI-MOD from GO db-xrefs file
New datasource PSORT from GO db-xrefs file
New datasource PubChem_BioAssay from GO db-xrefs file
New datasource PubChem_Compound from GO db-xrefs file
New datasource PubChem_Substance from GO db-xrefs file
New datasource RAP-DB from GO db-xrefs file
Ignoring Reactome from OBO as it is already registered as a datasource
Ignoring REBASE from OBO as it is already registered as a datasource
New datasource RefGenome from GO db-xrefs file
Ignoring RefSeq from OBO as it is already registered as a datasource
Ignoring RESID from OBO as it is already registered as a datasource
Ignoring Rfam from OBO as it is already registered as a datasource
Ignoring RGD from OBO as it is already registered as a datasource
Ignoring RHEA from OBO as it is already registered as a datasource
New datasource RiceSES from GO db-xrefs file
New datasource RNAcentral from GO db-xrefs file
Ignoring RNAmods from OBO as it is already registered as a datasource
Ignoring RO from OBO as it is already registered as a datasource
New datasource SABIO-RK from GO db-xrefs file
New datasource Sanger from GO db-xrefs file
Ignoring SEED from OBO as it is already registered as a datasource
Ignoring SGD from OBO as it is already registered as a datasource
New datasource SGD_LOCUS from GO db-xrefs file
New datasource SGD_REF from GO db-xrefs file
Ignoring SGN from OBO as it is already registered as a datasource
New datasource SGN_ref from GO db-xrefs file
New datasource SGN_germplasm from GO db-xrefs file
Ignoring SMART from OBO as it is already registered as a datasource
Ignoring SO from OBO as it is already registered as a datasource
New datasource Soy_gene from GO db-xrefs file
New datasource SOY_QTL from GO db-xrefs file
New datasource SOY_ref from GO db-xrefs file
New datasource SUPERFAMILY from GO db-xrefs file
New datasource SynGO from GO db-xrefs file
New datasource SynGO-UCL from GO db-xrefs file
New datasource SYSCILIA_CCNET from GO db-xrefs file
New datasource TAIR from GO db-xrefs file
New datasource taxon from GO db-xrefs file
New datasource TC from GO db-xrefs file
Ignoring TGD from OBO as it is already registered as a datasource
New datasource TGD_LOCUS from GO db-xrefs file
New datasource TGD_REF from GO db-xrefs file
New datasource TO_GIT from GO db-xrefs file
New datasource TRANSFAC from GO db-xrefs file
New datasource TreeGenes from GO db-xrefs file
Ignoring UBERON from OBO as it is already registered as a datasource
New datasource UM-BBD from GO db-xrefs file
New datasource UM-BBD_enzymeID from GO db-xrefs file
New datasource UM-BBD_pathwayID from GO db-xrefs file
New datasource UM-BBD_reactionID from GO db-xrefs file
New datasource UM-BBD_ruleID from GO db-xrefs file
Ignoring UniParc from OBO as it is already registered as a datasource
Ignoring UniPathway from OBO as it is already registered as a datasource
Ignoring UniProt from OBO as it is already registered as a datasource
New datasource UniProtKB from GO db-xrefs file
New datasource UniProtKB-KW from GO db-xrefs file
New datasource UniProtKB-SubCell from GO db-xrefs file
New datasource UniRule from GO db-xrefs file
New datasource VZ from GO db-xrefs file
New datasource WB from GO db-xrefs file
New datasource WB_REF from GO db-xrefs file
Ignoring WBbt from OBO as it is already registered as a datasource
Ignoring WBls from OBO as it is already registered as a datasource
Ignoring WBPhenotype from OBO as it is already registered as a datasource
New datasource Wikipedia from GO db-xrefs file
New datasource WikipediaVersioned from GO db-xrefs file
Ignoring Xenbase from OBO as it is already registered as a datasource
New datasource YeastFunc from GO db-xrefs file
New datasource YuBioLab from GO db-xrefs file
Ignoring ZFIN from OBO as it is already registered as a datasource
New datasource TFClass from GO db-xrefs file
New datasource HGNC-UCL from GO db-xrefs file
New datasource Animal_QTLdb from GO db-xrefs file
New datasource Animal_CorrDB from GO db-xrefs file
New datasource AlphaFold from GO db-xrefs file
Adding paxo as datasource
Adding loinc as datasource

Examining oxo_oxo-web-1 to find CSV

Last login: Tue Sep 28 17:19:52 on ttys009
docker exec -it a270b0b333736e3c99c3bf44f29ec0f59d65a8af2e2d322e4005b64433dd2aa8 /bin/sh
docker exec -it a270b0b333736e3c99c3bf44f29ec0f59d65a8af2e2d322e4005b64433dd2aa8 /bin/sh
ll/ # ls
bin    dev    etc    home   lib    media  mnt    opt    proc   root   run    sbin   srv    sys    tmp    usr    var
/ # pwd
/
/ # cd mnt
/mnt # ls
hsqldb             hsqldb.lck         hsqldb.log         hsqldb.properties  hsqldb.script      hsqldb.tmp
/ # ls /mnt/neo4j
ls: /mnt/neo4j: No such file or directory
henrietteharmse commented 3 years ago

@joeflack4 Thank you for logging the issue in such detail. We will look into it as soon as we can.

matentzn commented 3 years ago

We should probably unify custom deployments in https://github.com/EBISPOT/ontotools-docker

These kinds of issues are mostly problems with mount points etc.. @joeflack4 did you try using the docker compose setup?

henrietteharmse commented 3 years ago

@matentzn https://github.com/EBISPOT/ontotools-docker addresses the complete Ontology Tools stack. However, not everyone wants to install the complete stack.

matentzn commented 3 years ago

I agree, I was more thinking of moving the docs there, and using docker compose even for the individual services; it would just be less support work in the future, bypassing typical mistakes in Following readme docs.

joeflack4 commented 3 years ago

Hey guys. Thanks for looking into this!

@matentzn Yep, I was just following the documentation. It asks to do docker-compose up, so yep, I called that from the root directory.

jamesamcl commented 3 years ago

It seems the paths are wrong in the supplied config.ini.

[Paths]
exportFileDatasources=datasources.csv
exportFileTerms=/path/terms.csv
exportFileMappings=/path/mappings.csv

should be

[Paths]
exportFileDatasources=/mnt/neo4j/datasources.csv
exportFileTerms=/mnt/neo4j/terms.csv
exportFileMappings=/mnt/neo4j/mappings.csv

I will get this updated. As @matentzn says we should probably in the long term unify all of these separate Docker instructions into the ontotools-docker repository, which is more up to date but currently only deploys the whole stack.

joeflack4 commented 3 years ago

Makes sense that that is the issue! For now, Nico has instructed me to use the full ontotools stack for my purposes, as it should serve my purposes just fine. But thanks for looking into this.