ESGF / esgf-installer

ESGF P2P Node Installer
https://esgf.llnl.gov/
Other
21 stars 21 forks source link

esgf-dashboard-ip not starting #615

Closed nathanlcarlson closed 5 years ago

nathanlcarlson commented 5 years ago

LD_LIBRARY_PATH

Using

/usr/local/esgf-dashboard-ip/bin/ip.service start

Reports the following error

/usr/local/esgf-dashboard-ip/bin/esgf-dashboard-ip: error while loading shared libraries: liblzma.so.5: cannot open shared object file: No such file or directory

liblzma.so.5 is intended to be found in /usr/local/conda/envs/esgf-pub/lib, but it is not included in the LD_LIBRARY_PATH in /etc/esg.env. ip.service has the following line.

[ -e /etc/esg.env ] && $(grep LD_LIBRARY_PATH /etc/esg.env)

This overwrites any manual attempt to specify the LD_LIBRARY_PATH with the specification in /etc/esg.env. This environment variable should include the conda env lib path above.

Missing Properties

Additionally, after manually resolving the above error, the following error is reported by ip.service start.

[START] ESGF_HOME attribute not found... setting /esg as default
[START] ESGF_HOME = [/esg/]
[START] //esg//config/esgf.properties
[START] Debug level 2 (1=ERROR, 2=WARNING, 3=DEBUG)
***************************************************
[ERROR][esgf-dashboard-ip.c][567] Please note that 9 DB properties are missing in the esgf.properties file. Please check! Exit
[ERROR][esgf-dashboard-ip.c][568] Mandatory properties are:
[ERROR][esgf-dashboard-ip.c][569] [db.host], [db.database], [db.port], [db.user]
[ERROR][esgf-dashboard-ip.c][570] [esgf.host]
[ERROR][esgf-dashboard-ip.c][572] [esgf.registration.xml.path]
[ERROR][esgf-dashboard-ip.c][573] [esgf.registration.xml.download.url]
[ERROR][esgf-dashboard-ip.c][574] [dashboard.ip.app.home]
[ERROR][esgf-dashboard-ip.c][575] Please check!!!

My initial thoughts are that the 9 DB properties are missing is misleading. Looking in /esg/config/esgf.properties it looks like only the following are missing

esgf.registration.xml.path
esgf.registration.xml.download.url
dashboard.ip.app.home

I do not know what these values should be.

nathanlcarlson commented 5 years ago

PR #616 has been created to begin addressing these issues.

nathanlcarlson commented 5 years ago

The 9 DB properties are missing is actually accurate. After looking through the source I found the /esg/config/esgf.properties parser starting on line 915 (the link should bring you right there). After reading through it some I deduced it is actually not finding the properties because of the white space around the equal sign. Like so

sample.property = value

It will find the property and value if it is instead

sample.property=value
nathanlcarlson commented 5 years ago

A quick solution to this, as we are using Python's ConfigParser, would be to set the space_around_delimiters flag to False. Unfortunately this was released in Python 3. I am going to look at the ESGF homegrown config parser, based on Python's ConfigParser.

nathanlcarlson commented 5 years ago

Fortunately, the new features of configparser in Python 3.6+ have been backported. See https://pypi.org/project/configparser/ . Also, it is already being installed, likely as a dependency of configobj.

nathanlcarlson commented 5 years ago

esgf-dashboard-ip has a number of issues after successfully running /usr/local/esgf-dashboard-ip/bin/ip.service start with httpd started.

[WARNING][esgf-dashboard-ip.c][589] Please note that 11 non-mandatory properties are missing in the esgf.properties file. Default values have been loaded
./.work/shards.xml:1: parser error : Space required after the Public Identifier
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
                                                 ^
./.work/shards.xml:1: parser error : SystemLiteral " or ' expected
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
                                                 ^
./.work/shards.xml:1: parser error : SYSTEM or PUBLIC, the URI is missing
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
                                                 ^

[xmlreader.c:795] Error: could not parse file ./.work/shards.xml
START PLANA
[ERROR][dbAccess.c][1252] select url_path,id,user_id from esgf_dashboard.dashboard_queue where processed=0 and not url_path like 'http%' order by timestamp ASC limit 1000;
[ERROR][dbAccess.c][1252] select url_path,id,user_id from esgf_dashboard.dashboard_queue where processed=0 and not url_path like 'http%' order by timestamp ASC limit 1000;
[ERROR][dbAccess.c][1252] select url_path,id,user_id from esgf_dashboard.dashboard_queue where processed=0 and not url_path like 'http%' order by timestamp ASC limit 1000;
[ERROR][dbAccess.c][1252] select url_path,id,user_id from esgf_dashboard.dashboard_queue where processed=0 and not url_path like 'http%' order by timestamp ASC limit 1000;
NOTICE:  table "cmip5_data_usage_continent_tmp" does not exist, skipping
[WARNING][dbAccess.c][291] Query submission failed [drop table if exists esgf_dashboard.cmip5_data_usage_continent_tmp; create table esgf_dashboard.cmip5_data_usage_continent_tmp as (SELECT EXTRACT (YEAR FROM (TIMESTAMP WITH TIME ZONE 'epoch' + fixed_log.date_fetched * INTERVAL '1 second')) AS year, EXTRACT (MONTH FROM (TIMESTAMP WITH TIME ZONE 'epoch' + fixed_log.date_fetched * INTERVAL '1 second')) AS month, COUNT(*) AS downloads, COUNT(distinct url) AS files, COUNT(distinct user_id_hash) AS users, SUM(fixed_log.size)/1024/1024/1024 AS gb, fixed_log.continent FROM (SELECT cl.continent, file.url, log.user_id_hash, max(log.timestamp) AS date_fetched, max(file.size) AS size FROM esgf_dashboard.dashboard_queue AS log JOIN esgf_dashboard.client_stats_dm AS cl ON (log.remote_addr=cl.ip) JOIN public.file_version AS file ON (UPPER(log.url_path) LIKE '%CMIP5%.NC' AND log.url_path=file.url) WHERE log.success AND log.duration > 1000 GROUP BY file.url, log.user_id_hash, cl.continent) AS fixed_log GROUP BY year, month, continent ORDER BY year, month);]

The error messages go one indefinitely and cannot be stopped with ctrl+c.

nathanlcarlson commented 5 years ago

This error is the result of there being no user_id column in the specified table.

[ERROR][dbAccess.c][1252] select url_path,id,user_id from esgf_dashboard.dashboard_queue where processed=0 and not url_path like 'http%' order by timestamp ASC limit 1000;
nathanlcarlson commented 5 years ago

This is a bit overwhelming and it may be a good idea to take a step back and re-evaluate this component. It has already consumed a large amount of time to get it to this point. It will likely require a significant amount of time still, based on these new error messages.

It certainly does not fit in being writing in C even though it, seemingly, performs very common tasks that existing Python components perform (db interactions, http requests, parsing xml, parsing config files, reporting information).

nathanlcarlson commented 5 years ago

It seems there was a version mismatch between the schema defined in esgf-dashboard-ip and that created by the esgf_dashboard_initialize script. This was resolved by pulling the correct version from the mirror. It is important to note that the schema migration .egg files had the same name and were labeled the same version, but had different content.

https://aims1.llnl.gov/esgf/dist/2.6/8/esgf-dashboard/esgf_dashboard-0.0.2-py2.7.egg 
!=
https://aims1.llnl.gov/esgf/dist/esgf-dashboard/esgf_dashboard-0.0.2-py2.7.egg 
nathanlcarlson commented 5 years ago

634 Notes some behavior of the esgf-dashboard-ip service that appears to be an issue, but is not. The details of this non-issue are included in that ticket, #634. As #634 details, the esgf-dashboard-ip service operates daily, that is it spends most of its time sleeping (literally sleep(24 hours)). A test was performed over the weekend to see if the data would update, and it did. This evidence is enough to say, unless something comes up, that the esgf-dashboard-ip service is in working order.